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(57) Abstract 



The present invention defines a DNA protein-binding assay useful for screening libraries of synthetic or biological compounds for their 
ability to bind DNA test sequences. The assay is versatile in that any number of test sequences can be tested by placing the test sequence 
adjacent to a defined protein-binding screening sequence. Binding of molecules to these test sequence changes the binding characteristics 
of the protein molecule to its cognate binding sequence. When such a molecule binds the test sequence the equilibrium of the DNArprotein 
complexes is disturbed, generating changes in the concentration of free DNA probe. Numerous exemplary target test sequences (SEQ ID 
NO:l to SEQ ID NO.-600) are set forth. The assay of the present invention is also useful to characterize the preferred binding sequences 
of any selected DNA-binding molecule. 
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SEQUENCE-DIRECTED DNA-BINDING MOLECULES 
COMPOSITIONS AND METHODS 



Field of the Invention 

The present invention relates to methods, systems, 
and kits useful for the identification of molecules 
that specifically bind to defined nucleic acid se- 
quences* Also described are methods for designing 
molecules having the ability to bind defined nucleic 
acid sequences and compositions thereof. 
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Background of the Invention 

Several classes of small molecules that interact 
with double-stranded DNA have been identified. Many of 
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these small molecules have profound biological effects. 
For example, many aminoacridines and polycyclic 
hydrocarbons bind DNA and are mutagenic, teratogenic, 
or carcinogenic. Other small molecules that bind DNA 
5 include: biological metabolites, some of which have 
applications as antibiotics and antitumor agents 
including actinomycin D, echinomycin, distamycin, and 
calicheamicin; planar dyes, such as ethidium and 
acridine orange; and molecules that contain heavy 

10 metals, such as cisplatin, a potent antitumor drug. 

The sequence binding preferences of most known DNA 
binding molecules have not, to date, been identified. 
However, several small DNA-binding molecules have been 
shown to preferentially recognize specific nucleotide 

15 sequences, for example: echinomycin has been* shown to 
preferentially bind the sequence [ (A/T) CGT] / [ ACG (A/T) ] 
(Gilbert et ai.); cisplatin has been shown to cova- 
lently cross-link a platinum molecule between the N7 
atoms of two adjacent deoxyguanosines (Sherman et ai.) ; 

20 and calicheamicin has been shown to preferentially bind 
and cleave the sequence TCCT/AGGA (Zein et al»). 

Many therapeutic DNA-binding molecules (such as 
distamycin) that were initially identified based on 
their therapeutic activity in a biological screen have 

25 been later determined to bind DNA. There are several 
examples in the literature referring to synthetic or 
naturally-occurring polymers of DNA-binding drugs. 
Netropsin, for example, is a naturally-occurring 
oligopeptide that binds to the minor groove of double- 

30 stranded DNA. Netropsin contains two 4-amino-l- 
methylpyrrole-2-carboxylate residues and belongs to a 
family of similar biological metabolites from Strepto- 
myces spp. This family includes distamycin, anthe- 
Ivencin (both of which contain three N-methylpyrrole 

35 residues) , noformycin, amidomycin (both of which 
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contain one N-methylpyrrole residue) and kikumycin 
(which contains two N-methylpyrrole residues, like 
netropsin) (Debart, et al.). Synthetic molecules of 
this family have also been described , including the 
5 above-mentioned molecules (Lown, et al . 1985) well as 
dimeric derivatives (Griffin et al., Gurskii, et al.) 
and certain analogues (Bialer, et al. 1980, Bialer, et 
al. 1981, Krowicki, et al.). 

Molecules in this family, particularly netropsin 

10 and distamycin, have been of interest because of their 
biological activity as antibacterial (Thrum et al. , 
Schuhmann, et al.), antiparasitic (Nakamura et al.) , 
and antiviral drugs (Becker, et al., Lown, et al . 1986, 
Werner, et al.) . 

15 Among the synthetic analogs of netropsin and 

distamycin are oligopeptides that have been designed to 
have sequence preferences different from their parent 
molecules. Such oligopeptides include the "lexi- 
tropsin" series of analogues. The N-methlypyrrole 

20 groups of the netropsin series were systematically 
replaced with N-methylimidazole residues, resulting in 
lexitropsins with increased and altered sequence 
specificities from the parent compounds (Kissinger, et 
al.). Further, a number of poly (N-methylpyrrolyl) - 

25 netropsin analogues have been designed and synthesized 
which extend the number of residues in the oligo- 
peptides to increase the size of the binding site 
(Dervan, 1986) . 

There are several different approaches that could 

30 be taken to look for small molecules that specifically 
inhibit the interaction of a given DNA-binding protein 
with its binding sequence (cognate site) * One approach 
would be to test biological or chemical compounds for 
their ability to preferentially block the binding of 

3 5 one specific DNA: protein interaction but not others. 
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Such an assay would depend on the development of at 
least two, preferably three, DNA: protein interaction 
systems in order to establish controls for distinguish- 
ing between general DNA-binding molecules (polycations 
5 like heparin or intercalating agents like ethidium) and 
DNA-binding molecules having sequence binding prefer- 
ences that would affect protein/ cognate binding site 
interactions in one system but not the other (s) . 

One illustration of how this system could be used 
10 is as follows. Each cognate site could be placed 5' to 
a reporter gene (such as genes encoding j8-galactoside 
or lucif erase) such that binding of the protein to the 
cognate site would enhance transcription of the 
reporter gene. The presence of a sequence-specific 
15 DNA-binding drug that blocked the DNA: protein interac- 
tion would decrease the enhancement of the reporter 
gene expression. Several DNA enhancers could be 
coupled to reporter genes, then each construct compared 
to one another in the presence or absence of small DNA- 
20 binding test molecules. In the case where multiple 
protein /cognate binding sites are used for screening, 
a competitive inhibitor that blocks one interaction but 
not the others could be identified by the lack of 
transcription of a reporter gene in a transfected cell 
25 line or in an in vitro assay* Only one such DNA- 
binding sequence, specific for the protein of interest, 
could be screened with each assay system. This 
approach has a number of limitations including limited 
testing capability and the need to construct the 
30 appropriate reporter system for each different pro- 
tein/cognate site of interest. 

Another example of a system to detect sequence- 
specific DNA-binding molecules would involve cloning a 
DNA-binding protein of interest, expressing the protein 
35 in an expression system (e.g., bacterial, baculovirus, 
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or mammalian expression systems) , preparing a purified 
or partially purified sample of protein, then using the 
protein in an in vitro competition assay to detect 
molecules that blocked the DNA: protein interaction. 
5 These types of systems are analogous to many recep- 
tor: ligand or enzyme: substrate screening assays 
developed in the past, but have the same limitations as 
outlined above in that a new system must be developed 
for every different protein/cognate site combination of 

10 interest. The capacity for screening numerous differ- 
ent sequences is therefore limited. 

Another example of a system designed to detect se- 
quence-specific DNA-binding drugs would be the use of 
DNA footprinting procedures as described in the 

15 literature. These methods include DNase I or other 
nuclease footprinting (Chaires, et a2*)/ hydroxy 
radical footprinting (Portugal, et al.), methidiumpro- 
pyl EDTA(iron) complex footprinting (Schultz, et al.), 
photofootprinting (Jeppesen, etai.), and bidirectional 

20 transcription footprinting (White, et al . ) . These 
procedures are likely to be accurate within the limits 
of their sequence testing capability but are seriously 
limited by (i) the number of different DNA sequences 
that can be used in one experiment (typically one test 

25 sequence that represents the binding site of the DNA- 
binding protein under study) , and (ii) the difficulty 
of developing high throughput screening systems. 

Summary of the Invention 

30 In one aspect, the invention includes a method of 

constructing a DNA-binding agent capable of sequence- 
specific binding to a duplex DNA target region. - The 
method includes identifying in the duplex DNA, a target 
region containing a series of at least two non-overlap- 

35 ping base-pair sequences of four base-pairs each, where 
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the four base-pair sequences are adjacent, and each 
sequence is characterized by sequence-preferential 
binding to a duplex DNA-binding small molecule* The 
small molecules are coupled to form a DNA-binding agent 
5 capable of sequence-specific binding to said target 
region. 

In one embodiment, the duplex-binding small 
molecules are identified as molecules capable of 
binding to a selected test sequence in a duplex DNJ\ by 

10 first adding a molecule to be screened to a test system 
composed of (a) a DNA-binding protein that is effective 
to bind to a screening sequence in a duplex DNA, with 
a binding affinity that is substantially independent of 
the test sequence adjacent the screening sequence, but 

15 that is sensitive to binding of molecules to such test 
sequence, when the test sequence is adjacent the 
screening sequence, and (b) a duplex DNA having said 
screening and test sequences adjacent one another, 
where the binding protein is present in an amount that 

20 saturates the screening sequence in the duplex DNA* 

The test molecule is incubated in the test system 
for a period sufficient to permit binding of the 
molecule being tested to the test sequence in the 
duplex DNA. The degree of binding protein bound to the 

25 duplex DNA before adding the test molecule is compared 
with that after adding the molecule. The screening 
sequence may be from the HSV origin of replication, and 
the binding protein may be UL9. Exemplary screening 
sequences are identified as SEQ ID NO: 601, SEQ ID 

30 NO: 602, SEQ ID NO: 615, and SEQ ID NO: 641. 

Specific examples of tetrameric basepair sequences 
include TTTC, TTTG, TTAC, TTAG, TTGC, TTGG, TTCC, TTCG, 
TATC, TATG, TAAC, TAAG, TAGC, TAGG, TACC, TAGC sequenc- 
es. A specific example of a small molecule capable of 

35 binding to these sequences is distamycin. 
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In another aspect, the invention includes a method 
of blocking transcriptional activity from a duplex DNA 
template. The method includes identifying in the 
duplex DNA, a binding site for a transcription factor 
5 and, adjacent the binding site, a target region having 
a series of at least two non-overlapping tetrameric 
base-pair sequences, where the four (tetrameric) base- 
pair sequences are adjacent and each sequence is 
characterized by sequence-preferential binding to a 
10 duplex DNA-binding small molecule* The sequences are 
contacted with a binding agent composed of the small 
molecules coupled to form a DNA-binding agent capable 
of sequence-specific binding to said target region* 

The target may be selected, for example, from DNA 
15 sequences adjacent a binding site for a eucaryotic 
transcription factor, such as transcription factor 
TFIID, or a procaryotic transcription factor, such as 
transcription sigma factor* 

For mammalian transcription factors, the target 
20 region is typically chosen from non-conserved regions 
adjacent the transcription factor binding site. Target 
regions can be chosen so that the small molecule 
binding overlaps an adjacent transcription factor DNA 
binding sequence (e.g., for a TFIID binding site, by 1- 
25 3 nucleotide pairs) . In this case, the specificity of 
DNA binding for the small molecule is essentially 
derived from the ncn-conserved sequences adjacent the 
transcription factor binding site, in order to reduce 
small molecule binding at the transcription factor 
30 binding site associated with other genes. 

Also disclosed is a DNA-binding agent capable of 
binding with base-sequence specificity to a target 
region in duplex DNA, where the target region contains 
at least two adjacent four base-pair sequences. The 
35 agent includes at least two subunits, where each 
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subunit is a small molecule which has a sequence- 
preferential binding affinity for a sequence of four 
base-pairs in the target region. The subunits are 
coupled to form a DNA-binding agent capable of se- 
5 quence-specif ic binding to said target region. 

In one general embodiment, the agent is designed 
for binding to a sequence in which the two tetrameric 
basepair sequences are separated (for example, by up to 
20 basepairs, typically, 1 to 6 basepairs) and the 
10 small molecules in the agent are coupled to each other 
by a spacer molecule. 

Also forming part of the invention is a method of 
constructing a binding agent capable of sequence- 
specific binding to a duplex DNA target region. The 
15 method includes identifying in the duplex DNA, a target 
region containing (i) a series of at least two adjacent 
non-overlapping base-pair sequences of four base-pairs 
each, where each four base-pair sequence is character- 
ized by sequence-preferential biding to a duplex DNA- 
20 binding small molecule, and (ii) adjacent to (i) a DNA 
duplex region capable of forming a triplex with a 
third-strand oligonucleotide. The two small molecules 
are coupled to form a DNA-binding agent capable of 
sequence-specific binding to said target region, and 
25 the DNA-binding agent is attached to a third-strand 
oligonucleotide . 

The binding of the DNA-binding agent to duplex DNA 
causes a shift from B form to A form DNA, allowing 
triplex binding between the third-strand polynucleotide 
30 and a portion of the target sequence. 

Also disclosed is a triple-strand forming agent 
for use in practicing the method. 

In still another aspect, the invention includes a 
method of ordering the sequence binding preferences a 
35 DNA-binding molecule. The method includes adding a 
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molecule to be screened to a test system composed of 
(a) a DNA-binding protein that is effective to bind to 
a screening sequence in a duplex DNA with a binding 
affinity that is substantially independent of such test 
5 sequence adjacent the screening sequence, but that is 
sensitive to binding of molecules to such test se- 
quence, and (b) a duplex DNA having said screening and 
test sequences adjacent one another, where the binding 
protein is present in an amount that saturates the 

10 screening sequence in the duplex DNA* The molecule in 
the test system is incubated for a period sufficient to 
permit binding of the molecule being tested to the test 
sequence in the duplex DNA, and the amount of binding 
protein bound to the duplex DNA before and after 

15 addition of the test molecule is compared . These steps 
are repeated using all test sequences of interest, and 
the sequences are then ordered on the basis of relative 
amounts of protein bound in the presence of the 
molecule for each test sequence. 

20 The test sequences are selected, for example, from 

the group of 256 possible four base sequences composed 
of A, G, C and T. The DNA screening sequence is 
preferably from the HSV origin of replication, and the 
binding protein is preferably UL9. 

25 The invention also includes, a method for altering 

the binding characteristics of a DNA-binding protein to 
a duplex DNA. In the method, a binding site fcr the 
DNA-binding protein is identified in the duplex DNA and 
a target region identified adjacent the binding site. 

30 A small molecule is selected that is characterized by 
sequence-preferential binding to the target region. 
Such molecules can be selected by the assay and methods 
of the present invention. When the small molecule is 
bound to the target region, the small molecule is 

35 typically adjacent to the binding site for the DNA- 
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binding protein. Alternatively, the binding of the 
small molecule may overlapping the site for the DNA- 
binding protein by at least one nucleotide pair. In 
the case of such overlap, the specificity of DNA 
5 binding for the small molecule is essentially derived 
from non-conserved sequences adjacent the DNA-binding 
protein's binding site — in order to reduce small 
molecule binding at similar DNArprotein binding Fites 
at other locations. Finally, 
10 the duplex DNA is contacted with the small molecule at 
a concentration effective to alter binding of the DNA- 
binding protein to its binding site. 

In this method, contacting the duplex DNA with a 
small molecule can either inhibit or enhance the 
15 binding of the DNA-binding protein to its binding site; 
depending on the small molecule that is selected. 
Exemplary DNA binding proteins include DNA replication 
factors and a variety of transcription factors. 

One application of this method is to eucaryotic 
20 general transcription factors (e.g., TFIID) , where the 
target region is typically selected from DNA sequences 
adjacent the binding site for the eucaryotic transcrip- 
tion factor (e.g., SEQ ID NO:l to SEQ ID NO:600). In 
one embodiment, the DNA binding protein is a eucaryotic 
25 general transcription factor and the small molecule 
binds, in addition to the target region, 1 to three 
nucleotide pairs of the DNA-binding protein's binding 
site. In the case of TFIID, the small molecule 
typically binds to (i) the target region, and (ii) up 
30 tc two nucleotides of the binding site for TFIID, where 
the nucleotides are contiguous to the target region. 

Generally, the present invention provides a method 
of screening for molecules capable of binding to a 
selected test sequence in a duplex DNA. In the method 
35 of the present invention a test sequence of interest is 
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selected. Such sequences can be selected, for example, 
from the group of sequences presented as SEQ ID NO:l to 
SEQ ID NO: 600. Alternatively, the test sequences can 
be sequences having randomly generated sequences or 
5 defined sets of sequences, such as, the group of 256 
possible four base sequences composed of A, G, C and T. 

A duplex DNA test oligonucleotide is constructed 
having a screening sequence adjacent a selected test 
sequence, where a DNA binding protein is effective to 
10 bind to the screening sequence with a binding affinity 
that is substantially independent of the adjacent test 
sequence. In such constructs the DNA protein binding 
to the screening sequence is sensitive to binding of 
test molecules to the test sequence. 
15 Molecules selected for testing/screening are added 

to a test system composed of (a) the DNA binding 
protein, and (b) the duplex DNA test oligonucleotide, 
which contains the screening and test sequences 
adjacent one another. Selected molecules are incubated 
20 in the test system for a period sufficient to permit 
binding of the molecule being tested to the test 
sequence in the duplex DNA. The amount of binding 
protein bound to the duplex DNA is compared before and 
after adding a test molecule. Comparison of the amount 
25 of binding protein bound to the duplex DNA before and 
after adding a test molecule can be accomplished, for 
example, using a gel band-shift assay or a filter- 
binding assay. 

In the method of the present invention a number of 
30 DNA: protein interactions may be used for screening 
purposes* In one embodiment, the DNA screening 
sequence is from the HSV origin of replication and the 
binding protein is UL9. Exemplary HSV origin of 
replication screening sequences include SEQ ID NO: 601, 
35 SEQ ID NO: 602 , SEQ ID NO: 615, and SEQ ID NO: 641 . 
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Other DNA: protein interactions useful in the 
practice of the present invention include restriction 
endonucleases and their cognate DNA-binding sequences. 
These reactions are typically carried out in the 
5 absence of divalent cations* 

In another embodiment, the invention includes a 
method of identifying test sequences in duplex DNA to 
which binding of a test molecule is moist preferred. In 
this method a mixture of duplex DNA test oligonucleo- 
10 tides is constructed, where each oligonucleotide has a 
screening sequence adjacent a test sequence as de- 
scribed above. The test oligonucleotides of the 
mixture typically contain different test sequences. 

A test molecule, to be screened, is added to a 
15 test reaction composed of (a) the DNA binding protein, 
and (b) the duplex DNA test oligonucleotide mixture. 
The molecule is incubated in the test reaction for a 
period sufficient to permit binding of the compound 
being tested to test sequences in the duplex DNA. Test 
20 oligonucleotides are separated from test oligonucleo- 
tides bound to binding protein 

The test oligonucleotides can be separated from 
test oligonucleotides bound to protein by, for example, 
passing the test reaction through a filter, where the 
25 filter is capable of capturing DNA: protein complexes 
but not DNA that is free of protein. One filter type 
useful in the practice of the present invention is the 
nitrocellulose filter. 

The separated test oligonucleotides are then 
30 amplified. These amplified test oligonucleotides are 
then recycled through the screening steps of the assay 
in order to obtain a desired degree of selection. The 
amplified test oligonucleotides are isolated and 
sequenced. 
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Exemplary test sequences include sequences 
selected from the group of 256 possible four base 
sequences composed of A, G, C and T. Further examples 
of desirable test sequences include test sequences 
5 derived from the sequences presented as SEQ ID NO:l to 
SEQ ID NO: 600. 

The amplification step in the method may be 
accomplished by polymerase chain reaction or other 
. methods of amplification, including, cloning and 
10 subsequent in vivo amplification of the cloning vector 
containing the sequences of interest. 

These and other objects and features of the 
invention will be more fully appreciated when the 
15 following detailed description of the invention is read 
in conjunction with the accompanying drawings. 

Brief Description of the Figures 

Figure 1A illustrates a DNA-binding protein 

20 binding to a screening sequence. Figures IB and ic 
illustrate how a DNA-bir:ding protein may be displaced 
or hindered in binding by a small molecule by two 
different mechanisms: because of stearic hinderance 
(IB) or because of conformational (allosteric) changes 

25 induced in the DNA by a small molecule (IC) . 

Figure 2 illustrates an assay for detecting 
inhibitory molecules based on their ability to prefer- 
entially hinder the binding cf a DNA-binding protein to 
its binding site. Protein (O) is displaced from DNA 

30 (/) in the presence of inhibitor (X). Two alternative 
capture/detection systems are illustrated, the capture 
and detection of unbound DNA or the capture and 
detection of DNA: protein complexes. 

Figure 3 shows a DNA-binding protein that is able 

35 to protect a biotin moiety, covalently attached to an 
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oligonucleotide sequence , from being recognized by 
streptavidin when a protein is bound to the DNA. 

Figure 4 shows the incorporation of biotin and 
digoxigenin into a typical oligonucleotide molecule for 
5 use in the assay of the present invention. The oligo- 
nucleotide contains the binding sequence (i.e., the 
screening sequence) of the UL9 protein, which is 
underlined, and test sequences flunking the screening 
sequence. Figure 4 also shows the preparation of 

10 double-stranded oligonucleotides end-labeled with 
either digoxigenin or 32 P. 

Figure 5 shows a series of sequences that have 
been tested in the assay of the present invention for 
the binding of sequence-specific small molecules. 

15 Figure 6 outlines the clonings, into an expression 

vector, of a truncated form of the UL9 protein (UL9- 
COOH) which retains its sequence-specific DNA-binding 
ability. 

Figure 7 shows the pVL1393 baculovirus vector 
20 containing the full length UL9 protein coding sequence. 

Figure 8 is a photograph of a SDS-polyacrylamide 
gel showing (i) the purified UL9-C00H/glutathione-S- 
transf erase fusion protein and (ii) the UL9-COOH 
polypeptide. 

25 Figure 9 presents data demonstrating the effect on 

UL9-C00H binding of alterations in the test sequences 
that flank the UL9 screening sequence. 

Figure 10A shows the effect of the addition of 
several concentrations of distamycin A to DNA: protein 

30 assay reactions utilizing different test sequences. 
Figure 10B shows the effect of the addition of actino- 
mycin D to DNA: protein assay reactions utilizing 
different test sequences. Figure 10C shows the effect 
of the addition of Doxorubicin to DNA: protein assay 

35 reactions utilizing different test sequences. 
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Figure 11A illustrates a DNA capture system of the 
present invention utilizing biotin and streptavidin 
coated magnetic beads. The presence of the DNA is 
detected using an alkaline-phosphatase substrate that 
5 yields a chemiluminescent product. Figure 11B shows a 
similar reaction using biotin coated agarose beads that 
are conjugated to streptavidin, that in turn is 
conjugated to the captured DNA. 

Figure 12 demonstrates a test matrix based on 
10 DNA: protein-binding data. 

Figure 13 lists the top strands (5 '-3') of all the 
possible four base pair sequences that could be used as 
a defined set of ordered test sequences in the assay. 
Figure 14 A lists the top strands (5 '-3') of all 
15 the possible four base pair sequences that have the 
same base composition as the sequence 5'-GATC-3 r . This 
is another example of a defined, ordered set of se- 
quences that could be tested in the assay. Figure 14B 
presents the general sequence of a test oligonucleotide 
20 (SEQ ID NO: 617) , where XXXX is the test sequence and N 
- A,G,C, or T. 

Figure 15 shows the results of 4 duplicate experi- 
ments in which the binding activity of distamycin was 
tested with all possible (256) four base pair se- 
25 quences. The oligonucleotides are ranked from 1 to 256 
(column 1, "rank") based on their average rank from the 
four experiments (column 13, M ave. rank"), (rank is 
shown in the first column of the chart) . 

Figure 16 shows the average ranks {Figure 15) 
30 plotted against the ideal ranks 1 to 256. 

Figure 17 shows the average r% scores (Figure 15) 
plotted against the rank of 1 to 256. 

Figure 18 shows the results of eight experiments 
with actinomycin D. The r% scores and rank are shown 
35 for each of the 256 oligonucleotides. 
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Figure 19 shows the average r% versus rank, by 
average rank (data from Figure 18) . 

Figure 20 shows the ideal and average ranks for 
each of the 256 oligonucleotides. 
5 Figure 21 shows the results of a position analysis 

for actinomycin D preference. 

Figure 22 presents the data for a dinucleotide 
analysis of actinomycin D binding preference. 

Figure 23 graphically displays the results 
10 presented in Figure 22. 

Figure 24 graphically displays the data presented 
in Figure 22, where the data are combined in a combined 
bar chart so that the cumulative results for any 
dinucleotide pair are tabulated in a single bar. 
15 Figure 25 shows the top strands of 16 possible 

duplex DNA target sites for binding bis-distaroycins. 

Figure 26 shows examples of bis-distamycin target 
sequences for bis-distamycins with internal flexible 
and/or variable length linkers targeted to sites 
20 comprised of two TTCC sequences, where N is any base. 

Figures 27A to 27H show sample oligonucleotides 
for competition binding studies using the assay of the 
present invention. 

Figure 28 shows the DNA sequences of the HIV pro- 
25 viral promoter region. Several transcription factor 
binding sites are marked. 

Figures 29A to 29D illustrate sample test oligo- 
nucleotides for use in the polymerase chain reaction 
based selection technique of the present invention* In 
30 Figure 29A, X is the number of bases that comprise the 
test site. 

Figure 3 0 illustrates a sample test oligonucleo- 
tide for use in the assay of the present invention, 
where the test oligonucleotide employs several differ- 
35 ent DNArprotein interaction systems. 
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Figure 31 illustrates the results of screening a 
selected test sequence with a single DNA: protein 
interaction system. In the figure , the test site is 
shown in bold, the potential binding site for the test 
5 molecule is underlined. 

Figure 32 illustrates the results of screening the 
same selected test sequence as shown in Figure 31 , but 
using a different single DNA: protein interaction 
system. In the figure, the test site is shown in bold, 
10 the potential binding site for the test molecule is 
underlined. 

Detailed Description of the Invention 

I • Definitions : 

15 Adjacent is used to describe the distance rela- 

tionship between two neighboring sites . Adjacent sites 
are 20 or less bp apart, and can be separated by any 
fewer number of bases including the situation where the 
sites are immediately abutting one another ♦ "Flanking" 

20 is a synonym for adjacent. 

Bound DNA , as used in this disclosure, refers to 
the DNA that is bound by the protein used in the assay 
(e.g., a test oligonucleotide containing the UL9 
binding sequence bound to the UL9 protein. 

25 Coding sequences or coding regions are DNA se- 

quences that code for RNA transcripts, unless specified 
otherwise . 

Dissociation is the process by which two molecules 
cease to interact: the process occurs at a fixed 
30 average rate under specific physical conditions. 

Functional binding is the noncovalent association 
of a protein or small molecule to the DNA molecule. In 
one embodiment of the assay of the present invention 
the functional binding of the UL9 protein to a screen- 
35 ing sequence (i.e., its cognate DNA binding site) has 
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been evaluated using filter binding or gel band-shift 
experiments . 

Half -life is herein defined as the time required 
for one-half of the associated complexes, B.g., 
5 DNAtprotein complexes, to dissociate* 

Heteropo lvmers are molecules comprised of at least 
two different subunits f each representing a different 
type or class of molecule. The covalent coupling of 
different subunits, such as, DNA-binding molecules or 

10 portions of DNA-binding molecules, results in the 
formation of a heteropolymer : for example, the 
coupling of a non- intercalating homopolymeric DNA- 
binding molecule, such as distamycin, to an intercalat- 
ing drug, such as daunomycin. Likewise, the coupling 

15 of netropsin, which is essentially a molecular subunit 
of distamycin, to daunomycin would also be a hetero- 
polymer. As a further example, the coupling of dista- 
mycin, netropsin, or daunomycin to a DNA-binding 
homopolymer, such as a triplex-forming oligonucleotide, 

2 0 would result in a heteropolymer* 

Homopolvmers are molecules that are comprised of 
a repeating subunit of the same type or class. Two 
examples of duplex DNA-binding homopolymers are as 
follows: (i) triplex-forming oligonucleotides or oli- 

25 gonucleotide analogs, which are composed of repeating 
subunits of nucleotides or nucleotide analogs, and (ii) 
oligopeptides, which are composed of repeating subunits 
linked by peptide bonds (e.gr., distamycin, netropsin). 
Sequence-preferential binding refers to DNA bind- 

30 ing molecules that generally bind DNA but that show 
preference for binding to some DNA sequences over 
others* Sequence-preferential binding is typified by 
several of the small molecules tested in the present 
disclosure, e.g., distamycin. Sequence-preferential 

35 and sequence-specific binding can be evaluated using a 
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test matrix such as is presented in Figure 12. For a 
given DNA-binding molecule, there are a spectrum of 
differential affinities for different DNA sequences 
ranging from non-sequence-specific (no detectable 
5 preference) to sequence preferential to absolute se- 
quence specificity (i.e., the recognition of only a 
single sequence among all possible sequences , as is the 
case with many restriction endonucleases) ♦ 

Sequence-specific binding refers to DNA binding 

10 molecules which have a strong DNA sequence binding 
preference. For example, the following demonstrate 
typical sequence-specific DNA-binding: (i) multimers 
(heteropolyraers and homopolymers) of the present 
invention (e.g., Section IV.E.l, ■ Mult imerizat ion; 

15 Example 13) , and (ii) restriction enzymes and the 
proteins listed in Table IV. 

Screening sequence is the DNA sequence that 
defines the cognate binding site for the DNA binding 
protein: in the case of UL9, the screening sequence 

20 can, for example, be SEQ ID NO: 601. 

Small molecules are desirable as therapeutics for 
several reasons related to drug delivery, including the 
following: (i) they are commonly less than 10 K 
molecular weight; (ii) they are more likely to be 

25 permeable to cells; (iii) unlike peptides or oligonu- 
cleotides, they are less susceptible to degradation by 
many cellular mechanisms; and, (iv) they are not as 
apt to elicit an immune response. Many pharmaceutical 
companies have extensive libraries of chemical and/or 

30 biological mixtures, often fungal, bacterial, or algal 
extracts, that would be desirable to screen with the 
assay of the present invention. Small molecules may be 
either biological or synthetic organic compounds, or 
even inorganic compounds (i.e., cisplatin) . 
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Test sequence is a DNA sequence adjacent the 
screening sequence. The assay of the present invention 
screens for molecules that, when bound to the test se- 
quence, affect the interaction of the DNA-binding 
5 protein with its cognate binding site (i.e., the 
screening sequence) . Test sequences can be placed 
adjacent either or both ends of the screening sequence. 
Typically, binding of molecules to the test sequence 
interferes with the binding of the DNA-binding protein 

10 to the screening sequence. However, some molecules 
binding to these sequences may have the reverse effect, 
causing an increased binding affinity of the DNA- 
binding protein to the screening sequence. Some mole- 
cules, even while binding in a sequence specific or se- 

15 quence preferential manner, might have no effect in the 
assay. These molecules would not be detected in the 
assay. 

Unbound DNA , as used in this disclosure, refers to 
the DNA that is not bound by the protein used in the 
20 assay (i.e., in the examples of this disclosure, the 
UL9 protein) . 

II. The Assay . 

One feature of the present invention is that it 

25 provides an assay to identify small molecules that will 
bind in a sequence-specific manner to medically 
significant DNA target sites. The assay facilitates 
the development of a new field of pharmaceuticals that 
operates by interfering with specific DNA functions, 

30 such as crucial DNA: protein interactions. A sensitive, 
well-controlled assay has been developmed (i) to detect 
DNA-binding molecules and (ii) to determine their se- 
quence-specificity and affinity. The assay can be used 
to screen large biological and chemical libraries. For 

35 example, the assay will be used to detect sequence- 
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specific DNA-binding molecules in f ermentation broths 
or extracts from various microorganisms. 

Furthermore, another application for the assay is 
to determine the sequence specificity and relative 
5 affinities of known DNA-binding drugs (and other DNA- 
binding molecules) for different DNA sequences. Such 
drugs, which are currently used primarily as antibiot- 
ics or anticancer drugs, may have previously unidenti- 
fied activities that make them strong candidates for 

10 therapeutics or therapeutic precursors in entirely 
different areas of medicine. The use of the assay to 
determine the sequence-binding preference of these 
known DNA-binding molecules enables the rational design 
of novel DNA-binding molecules with enhanced sequence- 

15 binding preference. The methods for designing and 
testing these novel DNA-binding molecules is described 
below. 

The screening assay of the present invention is 
basically a competition assay that is designed to test 

20 the ability of a test molecule to compete with a DNA- 
binding protein for binding to a short, synthetic, 
double-stranded oligodeoxynucleotide that contains the 
recognition sequence for the DNA-binding protein 
flanked on either or both sides by a variable test 

25 site. The variable test site may contain any DNA se- 
quence that provides a reasonable recognition sequence 
for a DNA-binding test molecule. Molecules that bind 
to the test site alter the binding characteristics of 
the protein in a manner that can be readily detected. 

30 The extent to which such molecules are able to alter 
the binding characteristics of the protein is likely to 
be directly proportional tc the affinity of the test 
molecule for the DNA test site. The relative affinity 
of a given molecule for different oligonucleotide se- 

35 quences at the test site (i.e., test sequences) can be 
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established by examining the molecule's effect on the 
DNA:protein interaction using each of the test se- 
quences - 
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The assay can be used to test specific target se- 
quences and to identify novel DNA-binding molecules. 

o 2 h T Say Pr ° VideS 3 ~™ f ° r the ^termination 
of the hxgh affinity DNA binding sites for a given DNA- 

bmdxng molecule, thus facilitating the identification 
of specific target sequences. 

i 

A * Genera] Oonsi < ^ 

The assay of the present invention has been 

TfTTL ^ d6teCting test ^lecules or compounds that 
affect the rate of transfer of a specific DNA molecule 
from one protein molecule to another identical protein 
in solution. xn 

solutL^tr ° f ^ Pr ° tein iS Pr ^ ared *» 

solut.cn. The concentration of protein is in excess to 

the concentration of the DNA so that virtually all of 
the DNA xs found in DNA:protei n complexes. The DNA is 
double-stranded oligonucleotide that contains the 
recognition sequence for a specific DNA-binding protein 
U.e., the screening sequence,. The protein used in 
the assay contains a DNA-binding domain that is 
specif xc for binding to the sequence within the oligo- 
nucleotide. The physical conditions of the solution 
pH, salt concentration, temperature) are 
adjusted such that the half-life of the complex is 
amenable to performing the assay (optimally a half-life 
of 5-120 mxnutes, , preferably in a range that is close 
to normal physiological conditions. 

As one DNArprotein complex dissociates, the 
released DNA rapidly reforms a complex with another 
Protexn in solution, since the protein is in excess to 
the DNA, dissociations of one complex always result in 
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the rapid reassociation of the DNA into another 
DNA: protein complex. At equilibrium, very few DNA 
molecules will be unbound. If the unbound DNA is the 
component of the system that is measured, the minimum 
5 background of the assay is the amount of unbound DNA 
observed during any given measurable time period. If 
the capture/ detection system used for capturing the 
unbound DNA is irreversible, the brevity of the 
observation period (the length of time used to capture 

10 the unbound DNA) and the sensitivity of the detection 
system define the lower limits of background DNA. 

Figure 1 illustrates how (i) such a protein can be 
displaced from its cognate binding site, (ii) a protein 
can be prevented from binding its cognate binding site, 

15 and (iii) how the kinetics of the DNA:protein interac- 
tion can be altered* In each case, the binding site 
for the test molecule is located at a site flanking the 
recognition sequence for the DNA-binding protein 
(Figure 1A) . One mechanism is stearic hinderance of 

20 protein binding by a small molecule (competitive 
inhibition; Figure IB) . Alternatively, a molecule may 
interfere with a DNA: protein binding interaction by 
inducing a conformational change in the DNA (allosteric 
interference, noncompetitive inhibition; Figure 1C) ♦ 

25 In either event, if a test molecule that binds the oli- 
gonucleotide hinders binding of the protein, even 
transiently, the rate of transfer of DNA from one 
protein to another will be decreased. This will result 
in a net increase in the amount of unbound DNA and a 

30 net decrease in the amount of protein-bound DNA. In 
other words, an increase in the amount of unbound DNA 
or a decrease in the amount of bound DNA indicates the 
presence of an inhibitor, regardless of the mechanism 
of inhibition (competitive or noncompetitive) . 
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Alternatively, molecules may be isolated that, 
when bound to the DNA, cause an increased affinity of 
the DNA-binding protein for its cognate binding site. 
In this case, the assay control samples (no drug added) 
are adjusted to less than 100% DNA:protein complex so 
that the increase in binding can be detected. The 
amount of unbound DNA (observed during a given measur- 
able time period after the addition of the molecule) 
will decrease and the amount of bound DNA will increase 
m the reaction mixture as detected by the cap- 
ture/detection system described in Section II. 

B - S^^™ **- * Approp ri.tP n N »- 

Experiments performed in support of the present 
invention have defined an approach for identifying 
molecules having sequence-preferential DNA-binding. m 
this approach small molecules binding to sequences 
adjacent the cognate binding sequence can inhibit the 
protein/ cognate DNA interaction. This assay has been 
designed to use a single DNA:protein interaction to 
screen for sequence-specific or sequence-preferential 
DNA-binding molecules that recognize virtually any se- 
quence. 

While DNA-binding recognition sites are usually 
quite small (4-17 bp) , the sequence that is protected 
by the binding protein is larger (usually 5 bp or more 
on either side of the recognition sequence ~ as 
detected by DNAase I protection (Galas, et ai.) or 
methylation interference (Siebenlist, et a!.). 

Experiments performed in support of the present 
invention demonstrated that a single protein and its 
cognate DNA-binding sequence can be used to assay 
virtually any DNA sequence by placing a sequence of 
interest adjacent to the cognate site: a small mole- 
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cule bound to the adjacent site can be detected by 
alterations in the binding characteristics of the 
protein to its cognate site. Such alterations might 
occur by either stearic hindrance (which would cause 
5 the dissociation of the protein) or induced conforma- 
tional changes in the recognition sequence for the 
protein (which may cause either enhanced binding or, 
more likely, decreased binding of the protein to its 
cognate site) . 

10 

1. Criteria for Choosing an Appropriate 
DNA-Bindina Protein . 

There are several considerations involved in 

choosing DNA: protein complexes that can be employed in 

15 the assay of the present invention including: 

a. ) The half-life of the DNA:protein 
complex should be short enough to accomplish the assay 
in a reasonable amount of time. The interactions of 
some proteins with their cognate binding sites in DNA 

20 can be measured in days not minutes: such tightly 
bound complexes would inconveniently lengthen the 
period of time it takes to perform the assay. 

b. ) The half-life of the complex should 
be long enough to allow the measurement of unbound DNA 

25 in a reasonable amount of time. For example, the level 
of free DNA is dictated by the ratio between the time 
needed to measure free DNA and the amount of free DNA 
that occurs naturally due to the dissociation of the 
complex during the measurement time period. 

30 In view of the above two considerations, practical 

useful DNA: protein half -lives fall in the range of 
approximately two minutes to several days: shorter 
half -lives may be accommodated by faster equipment and 
longer half-lives may be accommodated by destabilizing 

35 the binding conditions for the assay. 
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c.) A further consideration is that the 
kinetic interactions of the DNA: protein complex is 
relatively insensitive to the nucleotide sequences 
flanking the recognition sequence. The affinity of 
5 DNA-binding proteins may be affected by differences in 
the sequences adjacent to the recognition sequence. If 
the half-life of the complex is affected by the 
flanking, sequence, the analysis of comparative binding 
data between different flanking oligonucleotide se- 
10 quences becomes difficult but is not impossible. 

2) Testing DNA: Protein Interactions for 
Use in the Assay . 

a . ) Other DNA: Protein Interactions 

15 Useful in the Method of the Present Invention r There 
are many known DNA: protein interactions that may be 
useful in the practice of the present invention, 
including (i) the DNA protein interactions listed in 
Table IV, (ii) bacterial, yeast, and phage systems such 

20 as lambda o L -o fi /cro # and (iii) modified restriction 
enzyme systems (e.g., protein binding in the absence of 
divalent cations, see Section IV). Any protein that 
binds to a specific recognition sequence may be useful 
in the present invention. One constraining factor is 

25 the effect of the immediately adjacent sequences (the 
test sequences) on the affinity of the protein for its 
recognition sequence. DNA: protein interactions in 
which there is little or no effect of the test se- 
quences on the affinity of the protein for its cognate 

30 site are preferable for use in the described assay; 
however, DNA: protein interactions that exhibit test-se- 
quence-dependent differential binding may still be 
useful if algorithms that compensate for the differen- 
tial affinity are applied to the analysis of data. In 

35 general, the effect of flanking sequence composition on 
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the binding of the protein is likely to be correlated 
to the length of the recognition sequence for the DNA- 
binding protein. That is, the kinetics of binding for 
proteins with shorter recognition sequences are more 
5 likely to suffer from flanking sequence effects, while 
the kinetics of binding for proteins with longer 
recognition sequences are more likely to not be 
affected by flanking sequence composition* The present 
disclosure provides methods and guidance for testing 

10 the usefulness of such DNA: protein interactions , in the 
screening assay. 

b. ) The Use of UL9 Proteins in the 
Practice of the Present Invention . 

Experiments performed in support of the present 

15 invention have identified a DNA:protein interaction 
that is particularly useful for the above described 
assay: the Herpes Simplex Virus (HSV) UL9 protein that 
binds the HSV origin of replication (oris) . The UL9 
protein has fairly stringent sequence specificity. 

2 0 There appear to be three binding sites for UL9 in oris, 
SEQ ID NO: 601, SEQ ID NO: 602 and SEQ ID NO: 615 (Elias, 
et al.; Stow, et al.). °ne sequence (SEQ ID N0:601) 
binds with at least 10-fold higher affinity than the 
second sequence (SEQ ID NO: 602): the embodiments 

25 described below use the higher affinity binding site 
(SEQ ID NO: 601). Another useful UL9-binding site, 
alibi a lower affinity binding site, SEQ ID NO: 641, has 
also been identified. 

DNA: protein association reactions are performed in 

30 solution. The DNA: protein complexes can be separated 
from free DNA by any of several methods. One particu- 
larly useful method for the initial study of DNA pro- 
tein interactions has been visualization of binding 
results using band shift gels (Example 3A) . In this 

35 method DNA: protein binding reactions are applied to 
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polyacrylamide/TBE gels and the labelled complexes and 
free labeled DNA are separated electrophoretically . 
These gels are fixed , dried, and exposed to X-ray film. 
The resulting autoradiograms are examined for the 
5 amount of free probe that is migrating separately from 
the DNA: protein complex. These assays include (i) a 
lane containing only free labeled probe, and (ii) a 
lane where the sample is labeled probe in the presence 
of a large excess of binding protein. The band shift 

10 assays allow visualization of the ratios between 
DNA: protein complexes and free probe. However, they 
are less accurate than filter binding assays for rate- 
determining experiments due to the lag time between 
loading the gel and electrophcretic separation of the 

15 components. 

The filter binding method is particularly useful 
in determining the half -life for oligonucleotide :- 
protein complexes (Example 3B) . In the filter binding 
assay, DNA: protein complexes are retained on a filter 

20 while free DNA passes through the filter. This assay 
method is more accurate for half-life determinations 
because the separation of DNA: protein complexes from 
free probe is very rapid ♦ The disadvantage of filter 
binding is that the nature of the DNA: protein complex 

25 cannot be directly visualized. So if, for example, the 
competing molecule was also a protein competing for the 
binding of a site on the DNA molecule, filter binding 
assays cannot differentiate between the binding of the 
two proteins nor yield information about whether one or 

30 both proteins are binding. 

C. Preparation of Full Length UL9 and UL9-C00H 
Polypeptides . 

UL9 protein has been prepared by a number of 

35 recombinant techniques (Example 2) . The full length 
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UL9 protein has been prepared from baculovirus infected 
insect cultures (Example 3A, B, and C) . Further, a 
portion of the UL9 protein that contains the DNA- 
binding domain (UL9-COOH) has been cloned into a 
5 bacterial expression vector and produced by bacterial 
cells (Example 3D and E) . The DNA-binding domain of 
UL9 is contained within the C-terminal 317 amino acids 
of the protein (Weir, et al.). The UL9-COOH polypep- 
tide was inserted into the expression vector in-frame 

10 with the glutathione-S-transf erase (gst) protein • The 
gst/UL9 fusion protein was purified using affinity 
chromatography (Example 3E) . The vector also contained 
a thrombin cleavage site at the junction of the two 
polypeptides. Therefore, once the fusion protein was 

15 isolated (Figure 8, lane 2) it was treated with 
thrombin, cleaving the UL9-COOH/ gst fusion protein 
from the gst polypeptide (Figure 8, lane 3). The UL9- 
COOH-gst fusion polypeptide was obtained at a protein 
purity of greater than 95% as determined using Coomas- 

20 sie staining. 

Other hybrid proteins can be utilized to prepare 
DNA-binding proteins of interest. For example, fusing 
a DNA-binding protein coding sequence in-frame with a 
sequence encoding the thrombin site and also in-frame 

25 with the /?-galactoside coding sequence. Such hybrid 
proteins can be isolated by affinity or immunoaf f inity 
columns (Maniatis, et al.; Pierce, Rockford IL) . 
Further, DNA-binding proteins can be isolated by 
affinity chromatography based on their ability to 

30 interact with their cognate DNA binding site. For 
example, the UL9 DNA-binding site (SEQ ID NO; 601) can 
be covalently linked to a solid support (e.g., CnBr- 
activated Sepharose 4B beads, Pharmacia, Piscataway 
NJ) , extracts passed over the support, the support 

35 washed, and the DNA-binding then isolated from the 
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support with a salt gradient (Kadonaga) . Alternative- 
ly, other expression systems in bacteria, yeast, insect 
cells or mammalian cells can be used to express 
adequate levels of a DNA-binding protein for use in 
5 this assay* 

The results presented below in regard to the DNA- 
binding ability of the truncated UL9 protein suggest 
that full length DNA-binding proteins are not required 
for the DNA: protein assay of the present invention: 

10 only a portion of the protein containing the cognate 
site recognition function may be required. The portion 
of a DNA-binding protein required for DNA-binding can 
be evaluated using a functional binding assay (Example 
4A) . The rate of dissociation can be evaluated 

15 {Example 4B) and compared to that of the full length 
DNA-binding protein. However, any DNA-binding peptide, 
truncated or full length, may be used in the assay if 
it meets the criteria outlined in Section II.B.l, 
"Criteria for choosing an appropriate DNA-binding 

20 protein". This remains true whether or not the 
truncated form of the DNA-binding protein has the same 
affinity as the full length DNA-binding protein. 

D. Functional Binding and Rate of Dissociation . 

25 The full length UL9 and purified UL9-C00H proteins 

were tested for functional activity in "band shift" 
assays (see Example 4A) . The buffer conditions were 
optimized for DNA: protein-binding (Example 4C) using 
the UL9-COOH polypeptide. These DNA-binding conditions 

30 also worked well for the full-length UL9 protein. 
Radiolabeled oligonucleotides (SEQ ID NO: 614) that 
contained the 11 bp UL9 DNA-binding recognition se- 
quence (SEQ ID NO: 601) were mixed with each UL9 protein 
in appropriate binding buffer. The reactions were 

35 incubated at room temperature for 10 minutes (binding 
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occurs in less than 2 minutes) and the products were 
separated electrophoretically on non-denaturing 
polyacrylaraide gels (Example 4A) . 

The degree of DNA: protein-binding could be 
5 determined from the ratio of labeled probe present in 
DNA: protein complexes versus that present as free 
probe. This ratio was typically determined by optical 
scanning of autcradiograms and comparison of band 
intensities. Other standard methods may be used as 

10 well for this determination, such as scintillation 
counting of excised bands ♦ The UL9-COOH polypeptide 
and the full length UL9 polypeptide, in their respec- 
tive buffer conditions, bound the target oligonucleo- 
tide equally well. 

15 The rate of dissociation was determined using 

competition assays. An excess of unlabelled oligonu- 
cleotide that contained the UL9 binding site was added 
to each reaction. This unlabelled oligonucleotide acts 
as a specific inhibitor, capturing the UL9 protein as 

20 it dissociates from the labelled oligonucleotide 
(Example 4B) . The dissociation rate, as determined by 
a band-shift assay, for both full length UL9 and UL9- 
COOH was approximately 4 hours at 4°C or approximately 
10 minutes at room temperature. Neither non-specific 

25 oligonucleotides (a 10,000-fold excess) nor sheared 
herring sperm DNA (a 100,000-fold excess) competed for 
binding with the oligonucleotide containing the UL9 
binding site. 

30 E. oris Flanking Sequence Variation > 

As mentioned above, one feature of a DNA: protein- 
binding system to be used in the assay of the present 
invention is that the DNA: protein interaction is not 
affected by the nucleotide sequence of the regions 

35 adjacent the DNA-binding site. The sensitivity of any 
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DNA: protein-binding reaction to the composition of the 
flanking sequences can be evaluated by the functional 
binding assay and dissociation assay described above. 
To test the effect of flanking sequence variation 
5 on UL9 binding to the oris SEQ ID NO: 601 sequences oli- 
gonucleotides were constructed with 2 0-30 different se- 
quences (i.e., the test sequences) flanking the 5' and 
3' sides of the UL9 binding site. Further, oligonucle- 
otides were constructed with point mutations at several 

10 positions within the UL9 binding site* Most point 
mutations within the binding site destroyed recogni- 
tion. Several changes did not destroy recognition and 
these include variations at sites that differ between 
the UL9 binding sites (SEQ ID NO: 601, SEQ ID NO: 602 , 

15 SEQ ID NO:615 and SEQ ID NO:641): the second UL9 
binding site (SEQ ID NO: 602) shows a ten-fold decrease 
in UL9 : DNA binding affinity (Elias, et al.) relative to 
the first (SEQ ID NO: 601) . On the other hand, sequence 
variation at the test site (also called the test se- 

20 quence) , adjacent to the screening site (Figure 5, 
Example 5) , had virtually no effect on binding or the 
rate of dissociation. 

The results demonstrating that the nucleotide se- 
quence in the test site, which flanks the screening 

25 site, has no effect on the kinetics of UL9 binding in 
any of the oligonucleotides tested is a striking 
result. This allows the direct comparison of the 
effect of a DNA-binding molecule on test oligonucleo- 
tides that contain different test sequences. Since the 

30 only difference between test oligonucleotides is the 
difference in nucleotide sequence at the test site(s) , 
and since the nucleotide sequence at the test site has 
no effect on UL9 binding, any differential effect 
observed between the two test oligonucleotides in 

35 response to a DNA-binding molecule must be due solely 
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to the differential interaction of the DNA-binding 
molecule with the test sequence (s). In this manner, 
the insensitivity of UL9 to the test sequences flanking 
the UL9 binding site greatly facilitates the interpre- 
5 tation of results. Each test oligonucleotide acts as 
a control sample for all other test oligonucleotides* 
This is particularly true when ordered sets of test se- 
quences are tested (e.g., testing all 256 four base 
pair sequences (Figure 13) for binding to a single 
10 drug) . 

Taken together the above experiments support that 
the UL9-COOH polypeptide binds the SEQ ID NO: 601 se- 
quence with (i) appropriate strength, (ii) an accept- 
able dissociation time, and (iii) indifference to the 

15 nucleotide sequences flanking the screening site. 
These features suggested that the ULSforiS system could 
provide a versatile assay for detection of small mole- 
cule/DNA-binding involving any number of specific 
nucleotide sequences. 

20 The above-described experiment can be used to 

screen other DNA: protein interactions to determine 
their usefulness in the present assay. 



F. Small Molecules as Sequence-Specific Compet- 
25 itive Inhibitors . 

To test the utility of the present assay system 

several small molecules that have sequence-binding 

preferences (i.e., a preference for AT-rich versus GC- 

rich sequences) have been tested. 

30 Distamycin A binds relatively weakly to DNA (K A = 

2 x 10 s M* 1 ) with a preference for non-alternating AT- 
rich sequences (Jain, et al.; Sobell; Sobell, et al.). 
Actinoroycin D binds DNA more strongly (K A = 7.6 x io* 7 
M' 1 ) than Distamycin A and has been reported to have a 

35 relatively strong preference for the dinucleotide se- 
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quence dGdC (Luck, et al.; Zimmer; Wartel) . Each of 
these molecules poses a stringent test for the assay. 
Distamycin A tests the sensitivity of the assay because 
of its relatively weak binding* Actinomycin D challen- 
5 ges the ability to utilize flanking sequences since the 
UL9 recognition sequence contains a dGdC dinucleotide: 
therefore, it might be anticipated that all of the oli- 
gonucleotides, regardless of the test sequence flanking 
the assay site, might be equally affected by actino- 

10 mycin D. 

In addition, Doxorubicin, a known anti-cancer 
agent that binds DNA in a sequence-preferential manner 
(Chen, K-X, et al . ) , has been tested for preferential 
DNA sequence binding using the assay of the present 

15 invention. 

Actinomycin D f Distamycin A, and Doxorubicin have 
been tested for their ability to preferentially inhibit 
the binding of UL9 to oligonucleotides containing 
different sequences flanking the UL9 binding site 

20 (Example 6, Figure 5) . Furthermore, distamycin A and 
actinomycin D have been screened against all possible 
256 4 bp DNA sequences. Binding assays were performed 
as described in Example 5. These studies were comple- 
ted under conditions in which UL9 is in excess of the 

25 DNA (i.e., most of the DNA is in DNA: protein complex- 
es) . 

In the preliminary studies, distamycin A was 
tested with 5 different test sequences flanking the UL9 
screening sequence: SEQ ID NO: 605 to SEQ ID NO: 609. 

30 The results shown in Figure 10A demonstrate that 
Distamycin A preferentially disrupts binding to the 
test sequences UL9 polyT, UL9 polyA and, to a lesser 
extent, UL9 ATAT. Figure 10A also shows the concentra- 
tion dependence of the inhibitory effect of distamycin 

35 A: at 1 MM distamycin A most of the DNA: protein 
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complexes are intact (top band) with free probe 
appearing in the UL9 polyT and UL9 polyA lanes, and 
some free probe appearing in the UL9 ATAT lane; at 4 /zM 
free probe can be seen in the UL9 polyT and UL9 polyA 
5 lanes; at 16 /iM free probe can be seen in the UL9 polyT 
and UL9 polyA lanes; and at 40 /iM the DNA: protein in 
the polyT, UL9 polyA and UL9 ATAT lanes are near 
completely disrupted while some DNA: protein complexes 
in the other lanes persist. These results were 

10 consistent with the reported preference of Distamycin 
A for non-alternating AT-rich sequences, 

Actinomycin D was tested with 8 different test se- 
quences flanking the UL9 screening sequence: SEQ ID 
NO: 605 to SEQ ID NO: 609 , and SEQ ID NO: 611 to SEQ ID 

15 NO: 613. The results shown in Figure 10B demonstrate 
that actinomycin D preferentially disrupts the binding 
of UL9-COOH to the oligonucleotides UL9 CCCG (SEQ ID 
NO: 605) and UL9 GGGC (SEQ ID NO: 606). These oligonu- 
cleotides contain, respectively, three or five dGdC 

20 dinucleotides in addition to the dGdC dinucleotide 
within the UL9 recognition sequence. This result is 
consistent with the results described in the literature 
for Actinomycin D binding to the dinucleotide sequence 
dGdC. Apparently the presence of a potential preferred 

25 target site within the screening sequence (oris, SEQ ID 
NO: 601), as mentioned above, does not interfere with 
the function of the assay. 

Doxorubicin was tested with 8 different test se- 
quences flanking the UL9 screening sequence: SEQ ID 

30 NO: 605 to SEQ ID NO: 609, and SEQ ID NO: 611 to SEQ ID 
NO: 613. The results shown in Figure 10C demonstrate 
that Doxorubicin preferentially disrupts binding to 
oriEco3 , the test sequence of which differs from 
oriEco2 by only one base (compare SEQ ID NO: 612 and SEQ 

35 ID NO: 613) . Figure 10C also shows the concentration 
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dependence of the inhibitory effect of Doxorubicin: at 
15 iM Doxorubicin, the UL9 binding to the screening se- 
quence is strongly affected when oriEco3 is the test 
sequence, and more mildly affected when polyT, UL9 
5 GGGC, or oriEco2 was the test sequence; and at 35 fM 
Doxorubicin most DNA: protein complexes are nearly 
completely disrupted, with UL9 polyT and UL9ATAT 
showing some DNA still complexed with protein. Also, 
effects similar to those observed at 15 /*M were also 
10 observed using Doxorubicin at 150 nM, but at a later 
time point. 

The feasibility studies performed with the limited 
set of test sequences, described above, provided 
evidence that the results of the assay are not incon- 

15 sistent with the results reported in the literature. 
However, the screening of all possible 256 four base- 
pair sequences, using the assay of the present inven- 
tion, provides a much more extensive overview of the 
sequence preferences of distamycin A and actinomycin D. 

20 The actual ranking of values obtained from the 

assay, for any given test compound, can be variable. 
A number of sequences can be clustered having similar 
affinity: although absolute rank might not be determi- 
nable, relative ranks can be determined. 

25 The results obtained in the feasibility studies 

with both distamycin A and actinomycin D were corrobo- 
rated by the results obtained in the screen of all 256 
sequences. In other words, the rank of the oligonucle- 
otides remained internally consistent in the larger 

30 screen. Further, the screens of distamycin A and 
actinomycin D both support the general hypotheses 
described in the literature: that is, distamycin A has 
a preference for binding AT-rich sequences while 
actinomycin D has a preference for binding GC-rich se- 

35 quences. However, both drug screens of all possible 4 
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have not been described in the literature. 

Based on the data from 4 separate experiments 
(Examples 10 and 11; Figures 15, 16 and 17), consensus 
5 sequences can be derived for distamycin binding. One 
consensus sequence (Example 11) is relatively AT-rich, 
although the preference in the 4th base position is 
distinctly G or C. The other consensus sequence 
(Example 11) is relatively GOrich, with some of the 

10 sequences having a 75% GC-content. As noted above, the 
assay data is consistent with distamycin binding data 
shown in the literature. 

The ability of the assay to distinguish sequence 
binding preference using weak DNA-binding molecules 

15 with relatively poor sequence-specificity (such as 
distamycin A) is a stringent test of the assay. 
Accordingly, the present assay seems well-suited for 
the identification of molecules having better sequence 
specificity and/or higher sequence binding affinity. 

20 Further, the results demonstrate sequence preferential 
binding with the known anti-cancer drug Doxorubicin. 
This result indicates the assay may be useful for 
screening mixtures for molecules displaying similar 
characteristics that could be subsequently tested for 

25 anti-cancer activities as well as sequence-specific 
binding . 

Other compounds that may be suitable for testing 
in the present DNA: protein system or for defining 
alternate DNA: protein systems include the following 
30 categories of DNA-binding molecules. 

A first category of DNA-binding molecules includes 
non-intercalating major and minor groove DNA-binding 
molecules. For example, two major classes of major 
groove binding molecules are DNA-binding proteins (or 
35 peptides) and nucleic acids (or nucleic acid analogs 
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such as those with peptide or morpholino backbones) 
capable of forming triplex DNA. There are a number of 
non-intercalating minor groove DNA-binding molecules 
including, but not limited to the following: distamy- 
5 cin A, netropsin, mithramycin, chromomycin and oligomy- 
cin, which are used as antitumor agents and antibiot- 
ics; and synthetic antitumor agents such as berenil, 
phthalanilides, aromatic bisguanylhydrazones and 
bisquaternary ammonium heterocycles (for review, see 

10 Baguley, 1982) . Non-intercalating DNA-binding mole- 
cules vary greatly in structure: for example, the 
netropsin-distamycin series are oligopeptides compared 
to the diarylamidines berenil and stilbamidine. 

A second category of DNA-binding molecules 

15 includes* intercalating DNA-binding molecules. Interca- 
lating agents are an entirely different class of DNA- 
binding molecules that have been identified as antitu- 
mor therapeutics and include molecules such as daunomy- 
cin (Chaires, et al.) and nogalomycin (Fox, et al . , 

20 1988) (see Remers, 1984) . 

A third category of DNA-binding molecules includes 
molecules that have both groove-binding and intercala- 
ting properties. DNA-binding molecules that have both 
intercalating and minor groove binding properties 

25 include actinomycin D (Goodisman, et al.) f echinomycin 
(Fox, et al. 1990), triostin A (Wang, et al.), and 
luzopeptin (Fox, 1988). In general, these molecules 
have one or two planar polycyclic moieties and one or 
two cyclic oligopeptides. Luzopeptins, for instance, 

30 contain two substituted quinoline chromophores linked 
by a cyclic decadepsipeptide. They are closely related 
to the quinoxaline family, which includes echinomycin 
and triostin A, although they luzopeptins have ten 
amino acids in the cyclic peptide, while the quinoxa- 

35 line family members have eight amino acids. 
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In addition to the major classes of DNA-binding 
molecules, there are also some small inorganic mole- 
cules, such as cobalt hexamine, which is known to 
induce Z-DNA formation in regions that contain repeti- 
5 tive GC sequences (Gessner, et al.). Another example 
is cisplatin, cis^diamminedichloroplatinum(II) , which 
is a widely used anticancer therapeutic. Cisplatin 
forms a covalent intrastrand crosslink between the N7 
atoms of adjacent guanosines (Rice, et al.). 

10 Furthermore, there are a few molecules, such as 

calichemicin, that have unusual biochemical structures 
that do not fal3 in any of the major categories. 
Calichemicin is an antitumor antibiotic that cleaves 
DNA and is thoughu to recognize DNA sequences through 

15 carbohydrate moieties (Hawley, et aJ.). Several DNA- 
binding molecules, such as daunomycin, A447C, and 
cosmomycin B have sugar group, which may play a role in 
the recognition process. 

Limited sequence preferences for some of the above 

20 drugs have been suggested: for example, echinomycin is 
thought to preferentially bind to the sequence (A/TJCGT 
(Fox, et al.). However, the absolute sequence prefer- 
ences of the known DNA-binding drugs have never been 
demonstrated. Despite the large number of publications 

25 in this field, prior to the development of the assay 
described herein, no methods were available for 
determining sequence preferences among all possible 
binding sequences. 

30 G. Theoretical Considerations on the Concentra- 

tion of Assay Components . 

There are two major components in the assay, the 

test oligonucleotide (i.e., the test sequence) and the 

DNA-binding domain of UL9, which is described below. 

35 A number of theoretical considerations have been 
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employed in establishing the assay system. In one 
embodiment of the invention, the assay is used as a 
mass-screening assay: in this embodiment the smallest 
volumes and concentrations possible were desirable. 
5 Each assay typically uses about 0.1-0.5 ng DNA in a 15- 
20 /zl reaction volume (approximately 0.3-1.5 nM) . The 
protein concentration is in excess and can be varied to 
increase or decrease the sensitivity of the assay. In 
the simplest scenario (stearic hindrance) , where the 
10 small molecule is acting as a competitive inhibitor and 
the ratio of DNA: protein and DNA-binding test mole- 
cule: DNA is 1:1, the system kinetics can be described 
by the following equations: 



15 D + P ^ D:P, where k fp /k bp = = [D:P]/[D][P] 

and 



D + X D:X, where k fa /k bx = = [D:X] / [D] [X] 

20 

D = DNA, P - protein, X = DNA-binding mole- 
cule, k^ and k^ are the rates of the forward 
reaction for the DNA: protein interaction and 
DNA: drug interaction, respectively, and k^ 
25 and k bx are the rates of the backwards reac- 

tions for the respective interactions. 
Brackets, [], indicate molar concentration 
of the components. 



30 In the assay, both the protein, P, and the DNA-binding 
molecule or drug, X, are competing for the DNA. If 
stearic hindrance is the mechanism of inhibition, the 
assumption can be made that the two molecules are 
competing for the same site. When the concentration of 

35 DNA equals the concentration of the DNA: drug or 
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DNA: protein complex, the equilibrium binding constant, 
K^, is equal to the reciprocal of the protein concen- 
tration (1/[P]). When all three components are mixed 
together, the relationship between the drug and the 
5 protein can be described as: 

= z(\ x ) 

where *'z H defines the difference in affinity for the 
10 DNA between P and X. For example, if z =4, then the 
affinity of the drug is 4 -fold lower than the affinity 
of the protein for the DNA molecule. The concentration 
of X, therefore, must be 4 -fold greater than the 
concentration of P, to compete equally for the DNA 
15 molecule. Thus, the equilibrium affinity constant of 
UL9 will define the minimum level of detection with 
respect to the concentration and/or affinity of the 
drug. Low affinity DNA-binding molecules will be 
detected only at high concentrations; likewise, high 

2 0 affinity molecules can be detected at relatively low 

concentrations. With certain test sequences, complete 
inhibition of UL9 binding at markedly lower concentra- 
tions than indicated by these analyses have been 
observed, probably indicating that certain sites among 
25 those chosen for feasibility studies have affinities 
higher than previously published. Note that relatively 
high concentrations of known drugs can be utilized for 
testing sequence specificity. In addition, the binding 
constant of UL9 can be readily lowered by altering the 

3 0 pH or salt concentration in the assay if it ever 

becomes desirable to screen for molecules that are 
found at low concentration (e.g., in a fermentation 
broth or extract) . 

The system kinetic analysis becomes more complex 
3 5 if more than one protein or drug molecule is bound by 
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each DNA molecule* As an example, if UL9 binds as a 
dimer , 

D + 2P ^ DP 2 

5 

then the affinity constant becomes dependent on the 
square of the protein concentration: 

K = [DP 2 ]/[D][F] 3 

10 

The same reasoning holds true for the DNA-binding test 
molecule, X; if, 

D + 2X DXj 

15 

then the affinity constant becomes dependent on the 
square of the protein concentration: 

K - [DX^/tDHX] 2 

20 

Similarly, if the molar ratio of DNA to DNA-binding 
test molecule was 1:3, the affinity constant would be 
dependent on tho cube of the drug concentration. 

Experimentally, the ratio of molar components can 
25 be determined. Given the chemical equation: 

xD + yP D x P y , 

the affinity constant may be described as 

30 

K = [D x P y ]/[D] x [P]> 

where [] indicates concentration, D = DNA, P = protein, 
x = number of DNA molecules per DNA: protein complex, 
35 and y = number of protein molecules per DNA: protein 
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complex . By determining the ratio of DNA: protein 
complex to free DNA, one can solve for x and y: 

if a = the fraction of DNA that is free, then the 
fraction of DNA that is bound can be described as 1-a; 

and if x hauad zx tm (the ratio of DNA: protein complex 
to free DNA) is known for more than one DNA concentra- 
tion. This is because the affinity constant should not 
vary at different DNA concentrations . Therefore, 



Substituting the right side of the equation above, 
15 [Dl x P y ]/[Dir[P] y = [D2 x P y ]/[D2] x [P] y . 

Because the concentration of components in the 
assay can be varied and are known, the molar ratio of 
the components can be determined. Therefore, [Dl x P y ] 

20 and [D2 x P y ] can be described as (1-a,) [x,] and (1- 
^H^]/ respectively, and [Dl] and [D2] can be de- 
scribed as (aj [x,] and (a,)^], respectively. [P] 
remains constant and is described as (y)-(y/x) (1-a) (x) , 
where y is the total protein concentration and (y/x) (1- 

25 a) (x) is the protein complexed with DNA. 

The system kinetic analyses become more complex if 
the inhibition is allosteric (non-competitive inhibi- 
tion) rather than competition by stearic hindrance. 
Nonetheless, the probability that the relative effect 

30 of an inhibitor on different test sequences is due to 
its relative and differential affinity to the different 
test sequences is fairly high. This is particularly 
true in the assays in which all sequences within an 
ordered set (e.g., possible sequences of a given length 

35 or all possible variations of a certain base composi- 
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tion and defined length) are tested. In short, if the 
effect of inhibition in the assay is particularly 
strong for a single sequence, then it is likely that 
the inhibitor binds that particular sequence with 
5 higher affinity than any of the other sequences. 
Furthermore, while it may be difficult to determine the 
absolute affinity of the inhibitor, the relative 
affinities have a high probability of being reasonably 
accurate. This information will be most useful in 
10 facilitating, for instance, the refinement of molecular 
modeling systems. 

H. The Use of the Assay under Conditions of 
Very High Protein Concentration . 

15 When the screening protoin is added to the assay 

system at very high concentrations (i.e., high enough ~ 
to force binding to non-specific sites — the protein 
binds to non-specific sites on the oligonucleotide as 
well as the screening sequence) . This has been demon- 

20 strated using band shift gels: when serial dilutions 
are made of the protein and mixed with a fixed concen- 
tration of oligonucleotide, no binding (as seen by a 
band shift) is observed at very low dilutions (e.g., 
1:100,000), a single band shift is observed at moderate 

25 dilutions (e.g., 1:100) and a smear, migrating higher 
than the single band observed at moderate dilutions, is 
observed at high concentrations of protein (e.g., 
1:10). The observation of a smear is indicative of a 
mixed population of complexes, all of which presumably 

30 have the screening protein binding to the screening se- 
quence with high affinity, but in addition have a 
larger number of proteins bound with markedly lower 
affinity to other sites. 

Some of the low affinity binding proteins ar 

35 likely bound to the test sequence. For example, when 
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using the UL9-based system, the low affinity binding 
proteins are likely UL9 or less likely glutathiones- 
transferase: these are the only proteins in the assay 
mixture. These proteins are significantly more 
5 sensitive to interference by a molecule binding to the 
test sequence for two reasons. First, the interference 
is likely to be by direct stearic hinderance and does 
not rely on induced conformational changes in the DNA; 
secondly, the protein is a low affinity binding protein 

10 because the test site is not a cognate-binding se- 
quence- In the case of UL9, the difference in affinity 
between the low affinity binding and the high affinity 
binding appears to be at least two orders of magnitude. 
The filter binding assays capture more DNA: protein 

15 complexes when more protein is bound to the DNA. The 
relative results are accurate, but under moderate 
protein concentrations, not all of the bound DNA (as 
demonstrated by band shift assays) will bind to the 
filter unless there is more than one DNA: protein 

20 complex per oligonucleotide (e.g., in the case of UL9, 
more than one UL9 : DNA complex). This makes the assay 
exquisitely sensitive under conditions of high protein 
concentration. For instance, when actinomycin binds 
DNA at a test site under conditions where there is one 

25 DNA : UL9 complex per oligonucleotide, a preference for 
binding GC-rich oligonucleotides has been observed; 
under conditions of high protein concentration, where 
more than one DNA:UL9 complex is found per oligonucleo- 
tide, this binding preference is even more apparent. 

30 These results suggest that the effect of actinomycin D 
on a test site that is weakly bound by protein may be 
more readily detected than the effect of actinomycin D 
on the adjacent screening sequence. Therefore, 
employing high protein concentrations may increase the 

35 sensitivity of the assay. 
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III. Amplification-Based Selection Technique to Deter- 
mine the Sequence Preferences of DNA-Bindinq 
Molecules , 

A. Design of Test Oligonucleotides . 
5 The above-described assay can be coupled to 

amplification methods (in one embodiment, polymerase 
chain reaction (Mullis, et al . ; Mullis; Innis, et al.)) 
to achieve identification of the sequences to which 
binding of a test molecule is most preferred. 

10 In this embodiment of the present invention, a 

double stranded test oligonucleotide is synthesized 
that contains the following elements: 

(i) the binding site for a DNA-binding protein 
(for example, UL9) , i.e., the screening site, 

15 (ii) adjacent the screening site, a test site 

composed of more than two base pairs and preferably 
less than 20 base pairs (most preferably 4-12 bases) , 
and 

(iii) means to isolate selected sequences for 

20 amplification, such as a sufficient number of bases 
flanking the test site sequences to function as priming 
sites for polymerase chain reaction amplification or 
restriction sites useful to facilitate cloning. 

Priming sites can also be used as primer binding 

25 sites for dideoxy sequencing reactions and may contain 
restriction endonuclease cleavage sites to facilitate 
cloning manipulations. 

The double-stranded test oligonucleotide can be 
generated by second-strand synthesis using a primer 

30 complementary to the priming site at the 3' end of the 
top-strand of the test oligonucleotide. Alternatively, 
both strands can be generated by other means, such as 
chemical synthesis, and the double- stranded test oligo- 
nucleotides can be generated by hybridization of the 

35 strands. 
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An example of one such a test oligonucleotide is 
shown in Figure 29A (SEQ ID NO: 630, SEQ ID NO: 631 and 
SEQ ID NO: 632) . A specific example of a test oligonu- 
cleotide is shown in Figure 29B (SEQ ID NO: 633) , where 
5 X=4. All possible 256 four base pair sequences are 
represented at equimolar levels within the pool of oli- 
gonucleotides generated by this sequence design. 

Another example of such a test oligonucleotide se- 
quence is shown in Figure 29C (SEQ ID NO:634), for an 

10 8 base pair test sequence. In this pool of mixed se- 
quences, all possible 8 base pair sequences (4 8 = 
65,536) are present in equimolar amounts. 

A second set of test oligonucleotides may be 
constructed in which the test site is placed on the 

15 other side of the DNA-binding protein recognition site 
(e.g., Figure 29D, SEQ ID NO:635). 

For any single-stranded test oligonucleotide pool, 
the single-stranded molecules are annealed to a primer 
and the bottom strands are enzymatically synthesized by 

20 primer extension reactions. One advantage of using the 
assay/amplification PCR-cycling embodiment of the 
present invention is that it is convenient to work with 
larger test sequences in this embodiment. This 
protocol is geared to determining the highest affinity 

25 binding sequences and is not capable of determining the 
rank of all test sequences nor of identifying low 
affinity binding sites: such ranking can be determined 
by screening individual sequences as described above. 

30 B. Applying the Assay to the Mixed Pools of 

Test Oligonucleotides . 

Using double-stranded test oligonucleotides, such 

as those just described, the basic assay is performed 

essentially as described above (Section I): typically 

35 without the use of radioactive detection systems. As 
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previously discussed, a number of DNA: protein interac- 
tions may be used in this assay system. One example of 
such a system is the interaction of the DNA-binding 
domain of UL9 (or UL9-COOH) with its cognate recogni- 
5 tion sequence. 

In this embodiment of the present invention, UL9- 
COOH is added to the test oligonucleotide pool {for 
example, 256 four base pair sequences are represented 
at equimolar levels within the pool of oligonucleotides 

10 described above) in UL9 binding buffer. DNA-binding 
molecules are tested for the ability to differentially 
disrupt the binding of the UL9 DNA: protein complex by 
binding to the test sequence. After the addition of 
the test molecule or test mixture (e.g., a fermentation 

15 broth or fungal extract) , the assay mixture is incubat- 
ed for a desired time, then passed through a nitrocel- 
lulose filter. DNA: protein (such as DNA : UL9 ) complexes 
are captured on the filter. DNA that is not bound by 
protein passes through the filter (i.e., the filtrate) 

20 (step 1) . The volume of the assay is adjusted to 
accommodate the amount required for the filtering 
process: that is, taking into consideration the losses 
incurred during the filtering process. 

25 C. Amplification . 

In one embodiment, the DNA present in the filtrate 
is amplified using the polymerase chain reaction (?CR) 
technology (Mullis; Mullis, et al.; Perkin Elmer- 
Cetus) . An aliquot of the resulting PCR-amplif ied 

30 material is cycled through the DNA: protein binding 
assay again (step 2), then PCR-amplif ied again (step 
3) . Steps 1-3 are repeated several times using each 
subsequent filtrate. After each PCR amplification, 
part of the PCR-amplif ied material is retained for 

35 sequencing analysis. The result of the repeated 



WO 94/14980 



PCT/US93/12388 



59 

cyclings through the assay/amplification process is 
that the test oligonucleotide sequences that are 
amplified contain test sequences that are preferred 
binding sites for the test molecules* Through subse- 
5 quent rounds of assay/ amplification, these oligonucleo- 
tides are amplified to represent a larger and larger 
percent of the total population of amplified DNA mole- 
cules. 

In addition to PCR, the DNA present in the 

10 filtrate can be amplified by other methods as well* 
For example, the DNA present in the filtrate can be 
cloned into a selected vector (such as, phage vectors, 
e.g., lambda-based, or standard cloning vectors, e.g., 
pBR322- or pUC-based) . The cloned sequences are then 

15 transformed into an appropriate host organism in which 
the selected vector can replicate (for example, 
bacteria or yeast) . The transformed host organism is 
cultured with concurrent amplification of the vectors 
containing the cloned sequences. The vectors are then 

2 0 isolated by standard procedures (Maniatis, et al . ; 
Sambrook, et al.; Ausubel, et al.)- Typically, the 
cloned sequences, originally obtained from the DNA 
filtrate, are obtained from the vector by restriction 
endonuclease digestion and size-fractionation (for 

25 example, electrophoretic separation of the digestion 
products followed by electroelution of the cloned 
sequences of interest) (Ausubel, et al.). These 
isolated amplified test oligonucleotide sequences can 
then be recycled through subsequent rounds of as- 

30 say/amplification as described above. 

In another embodiment, the oligonucleotide 
sequences present in the original DNA filtrate can be 
isolated, sequenced and amplified by in vitro synthesis 
of copies of the oligonucleotides. 



35 
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D. Sequencing of Amplified DNA . 
Samples from each cycle are sequenced using, for 
example, radio- labeled primers and dideoxy sequencing 
methodologies (Sanger) or the chemical methodologies 
5 outlined by Maxam and Gilbert. If the amplified se- 
quences are not sufficiently resolved to obtain a 
unambiguous sequence information, then the DNA is 
further purified and sequenced. For example, the DNA 
is cleaved at the restriction endonuclease sites within 

10 the primer sequences and subcloned into a convenient 
sequencing vector, such as "BLUESCRIPT" (Stratagene, La 
Jolla, CA) . The sequencing vectors carrying the 
amplified inserts are transformed into bacteria. The 
resulting cloned vectors are isolated and sequenced (in 

15 the case of "BLUESCRIPT, 11 using the commercially 
available primers and protocols) . 

IV. Modifications of Test Oligonucleotides and other 
Useful DNA: Protein Interactions 

2 0 one class of DNA: protein interactions that may be 

useful in the assay of the present invention is the 
restriction endonuclease restriction site class of 
DNA: protein interactions. In the absence of divalent 
cations, restriction endonucleases bind DNA but have no 

25 enzymatic activity (cleavage of DNA does not take place 
without divalent cations) . This allows the assay of 
the present invention to be performed using a restric- 
tion endonuclease with its cognate binding site as the 
screening sequence. The use of the restriction 

30 endonuclease restriction site interaction as the basis 
of the present assay is described in greater detail in 
Section VI. B. 4(c). 

The test oligonucleotides of the present invention 
can be modified to contain two different DNA: protein 

35 screening systems, i.e., two different screening se- 
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quences with their respective cognate binding proteins. 
In the assay described above, the UL9 screening se- 
quence lies on one side of and immediately adjacent to 
the test sequence* A second screening sequence, such 
5 as, a restriction endonuclease recognition sequence 
(restriction site) , can be introduced immediately 
adjacent to the other side of the test sequence. 

Several restriction enzymes may recognize the 
same restriction site. These enzymes are not identi- 

10 cal, however, in that the cleavage sites may be at the 
5' end, the center, or the 3' end of the recognition 
sequence. For this reason, a restriction site that is 
recognized by more than one restriction enzyme may be 
incorporated adjacent to the test site. This allows a 

15 single pool of test oligonucleotides to be used in 
assays employing three different DNA:protein interac- 
tions: the screening sequence has the same sequence 
but the restriction endonuclease used in the assay 
system is different in each case. Using this method to 

20 design test oligonucleotides, the UL9 screening se- 
quence may be placed on on^ side of a test sequence and 
a restriction site screening sequence (having three 
cognate binding proteins) may be placed on the other 
side of the test sequence. Such a test oligonucleotide 

25 construction allows 4 different DNA: protein assay 
interaction systems to be employed with a single pool 
of test sequences. 

One example of test oligonucleotides using several 
different DNA: protein interaction systems are shown in 

30 Figure 30. The top strands of the pool of test oligo- 
nucleotides shown in Figure 30 have 6 base pair test 
sequences (NNNNNN) and represent synthetic pools of all 
possible 4096 test sequences. The remainder of the 
nucleotide sequence is fixed. The test oligonucleo- 

35 tides contain the UL9 recognition sequence, 5'- 
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CGTTCGCACTT-3 ' (underlined) on one side of the test se- 
quence and a restriction endonuclease binding sequence, 
5'-GGTACC-3' (bold), on the other side of the test 
site. The restriction endonuclease recognition se- 
5 quence is recognized by the three different restriction 
endonucleases Asp718, Rsal and KpnI. In Figure 3 0 the 
UL9 binding site (screening sequence) is located 3' of 
the test sequence: the UL9 binding site (screening se- 
quence) can also be located 5 ' of the test sequence. 

10 The shorter sequences shown above the 5' and 3' 

ends of the test oligonucleotides are primer sequences 
useful for sequencing and PCR amplification , The 
primer sequences contain commonly used restriction 
endonuclease sites for the purpose of subcloning into 

15 sequencing vectors. 

Performing the assay with two or more different 
protein/ screening sequence systems allows the confirma- 
tion of putative high affinity binding between a test 
compound and specific test sequences. 

20 Alternatively, since there is no assurance that a 

test molecule that binds the test sequence will have 
the same effect on protein binding at both adjacent 
flanking sequences, simultaneous use of both test 
systems may reduce the number of false negatives 

25 detected in an assay. For example, a test molecule 
that does not affect the binding of protein at one 
screening site but may effect the binding of a differ- 
ent protein at the other screening site. 

30 V. Capture /Detection Systems . 

As an alternative to the above described band 
shift gels and filter binding assays, the measurement 
of inhibitors can be monitored by measuring either the 
level of unbound DNA in the presence of test molecules 

35 or mixtures or the level of DNA: protein complex 
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remaining in the presence of test molecules or mix- 
tures. Measurements may be made either at equilibrium 
or, in a kinetic assay, prior to the time at which 
equilibrium is reached. The type of measurement is 
5 likely to be dictated by practical factors, such as the 
length of time to equilibrium, which will be determined 
by both the kinetics of the DNA: protein interaction as 
well as the kinetics of the DNA: drug interaction. The 
results (i.e., the detection of DNA-binding molecules 

10 and/ or the determination of their sequence preferences) 
should not vary with the type of measurement taken 
(kinetic or equilibrium) . 

Figure 2 illustrates an assay for detecting 
inhibitory molecules based on their ability to prefer- 

15 entially hinder the binding of a DNA-binding protein. 
In the presence of an inhibitory molecule (X) the 
equilibrium between the DNA-binding protein and its 
binding site (screening sequence) is disrupted. The 
DNA-binding protein (O) is displaced from DNA (/) in 

20 the presence of inhibitor (X) , the DNA free of protein 
or, alternatively, the DNA: protein complexes, can then 
be captured and detected. 

For maximum sensitivity, unbound DNA and DNA: pro- 
tein complexes should be sequestered from each other in 

25 an efficient and rapid manner. The method of DNA 
capture should allow for the rapid removal of the 
unbound DNA from he protein-rich mixture containing the 
DNA: protein complexes. 

Even if the test molecules are specific in their 

30 interaction with DNA they may have relatively low 
affinity and they may also be weak binders of non- 
specific DNA or have non-specific interactions with DNA 
at low concentrations. In either case, their binding 
to DNA may only be transient, much like the transient 

3 5 binding of the protein in solution. Accordingly, one 
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feature of the assay is to take a molecular snapshot of 
the equilibrium state of a solution comprised of the 
test oligonucleotide DNA, the protein, and the inhibi- 
tory test molecule. i„ the presence of an inhibitor, 
the amount of DNA that is not bound to protein will be 
greater than in the absence of an inhibitor. Likewise 
in the presence of an inhibitor, the amount of DNA that 
is bound to protein will be lesser than in the absence 
of an inhibitor. 

Any method used to separate the DNA: protein 
complexes from unbound DNA, should be rapid, because 
when the capture system is applied to the solution (if 
the capture system is irreversible), the ratio of 
unbound DNA to DNA: protein complex will change at a 
predetermined rate, based purely on the off -rate of the 
DNA:protein complex. This step, therefore, determines 
the limits of background. Unlike the protein and 
inhibitor, the capture system should bind rapidly and 
tightly to the DNA or DNA:protein complex. The longer 
the capture system is left in contact with the entire 
mixture of unbound DNA and DNA: P rotein complexes in 
solution, the higher the background, regardless of the 
presence or absence of inhibitor. 

Two exemplary capture systems are described below 
for use in the assay of the present invention. One 
capture system has been devised to capture unbound DNA 
(Section V.A) . The other has been devised to capture 
DNA:protein complexes (Section V.B) . Both systems are 
amenable to high throughput screening assays. The same 
detection methods (Section V.C) can be applied to mole- 
cules captured using either capture system. 

A - Capture of Dnhmmri nwa 

One capture system that has been developed in the 
course of experiments performed in support of the 
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present invention utilizes a streptavidin/biotin 
interaction for the rapid capture of unbound DNA from 
the protein-rich mixture, which includes unbound DNA, 
DNArprotein complexes, excess protein and the test 
molecules or test mixtures. Streptavidin binds with 
extremely high affinity to biotin (K d = i 0 - ,5 M) (Chaiet, 
et al.; Green). Accordingly, two advantages of the 
streptavidin/biotin system are that binding between the 
two molecules can be rapid and the interaction is the 
strongest known non-covalent interaction. 

In this detection system a biotin molecule is 
covalently attached in the oligonucleotide screening 
sequence (i.e., the DNA-binding protein's binding 
site). This attachment is accomplished in such a 
manner that the binding of the DNA-binding protein to 
the DNA is not destroyed. Further, when the protein is 
bound to the biotinylated sequence, the protein 
prevents the binding of streptavidin to the biotin. In 
other words, the DNA-binding protein is able to protect 
the biotin from being recognized by the streptavidin. 
This DNArprotein interaction is illustrated in Figure 
3. 

The capture system is described herein for use 
with the UL9/oriS system described above. The follow- 
ing general testing principles can, however, be applied 
to analysis of other DNArprotein interactions. The 
usefulness of this system depends on the biophysical 
characteristics of the particular DNAtprotein interac- 
tion. 

1 ' Modification of th g Protein ttecoani1-in n 
Sequence with Rintin . 

The recognition sequence for the binding of 

the UL9 (Koff, et al.) protein is underlined in Figure 

Oligonucleotides were synthesized that contain the 
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UL9 binding site and site-specifically biotinylated a 
number of locations throughout the binding sequence 
(SEQ ID NO:614; Example 1, Figure 4)* These biotinyla- 
ted oligonucleotides were then used in band shift 
5 assays to determine the ability of the UL9 protein to 
bind to the oligonucleotide. These experiments using 
the biotinylated probe and a non-biotinylated probe as 
a control demonstrate that the presence of a biotin at 
the #8-T (biotinylated deoxyuridine) position of the 

10 bottom strand meets the requirements listed above: the 
presence of a biotin moiety at the #8 position of the 
bottom strand does not markedly affect the specificity 
of JL9 for the recognition site. Further, in the 
presence of bound UL9, streptavidin does not recognize 

15 the presence of the biotin moiety in the oligonucleo- 
tide. Biotinylation at other A or T positions did not 
have the two necessary characteristics (i.e., UL9 
binding and protection from streptavidin) : biotinyla- 
tion at the adenosine in position #8, of the top 

20 strand, prevented the binding of UL9; biotinylation of 
either adenosines or thymidines (top or bottom strand) 
at positions #3, #4 , #10, or #11 all allowed binding of 
UL9, but in each case, streptavidin also was able to 
recognize the presence of the biotin moiety and thereby 

25 bind the oligonucleotide in the presence of UL9 . 

The above result (the ability of UL9 to bind to an 
oligonucleotide containing a biotin within the recogni- 
tion sequence and to protect the biotin from streptavi- 
din) was unexpected in that methylation interference 

30 data (Koff, et al . ) suggest that methylation of the 
deoxyguanosine residues at positions #7 and #9 of the 
recognition sequence (on either side of the biotinyla- 
ted deoxyuridine) blocks UL9 binding. In these 
methylation interference experiments, guanosines are 

35 methylated by dimethyl sulfate at the N 7 position, 



WO 94/14980 



PCT/US93/12388 



67 

which corresponds structurally to the 5-position of the 
pyriiaidine ring at which the deoxyuridine is biotinyla- 
ted. These moieties all protrude into the major groove 
of the DNA. The methylation interference data suggest 
5 that the #7 and #9 position deoxy guanos ines are contact 
points for UL9, it was therefore unexpected that the 
presence of a biotin moiety between them would not 
interfere with binding* 

The binding of the full length protein was 

10 relatively unaffected by the presence of a biotin at 
position #8 within the UL9 binding site. The rate of 
dissociation was similar for full length UL9 with both 
biotinylated and un-biotinylated oligonucleotides. 
However, the rate of dissociation of the truncated UL9- 

15 COOH polypeptide was faster with the biotinylated oli- 
gonucleotides than with non-biotinylated oligonucleo- 
tides (for non-biotinylated oligonucleotides the rate 
comparable to that of the full length protein with 
either DNA) . 

20 The binding conditions were optimized for UL9-COOH 

so that the half-life of the truncated UL9 from the 
biotinylated oligonucleotide was 5-10 minutes (opti- 
mized conditions are given in Example 4)/ a rate 
compatible with a mass screening assay. The use of 

25 multi-well plates to conduct the DNA: protein assay of 
the present invention is one approach to mass screen- 
ing. 

2 . Capture of Site-Specific Biotinylated 
30 Oligonucleotides . 

The streptavidin: biotin interaction can be 

employed in several different ways to remove unbound 

DNA from the solution containing the DNA, protein, and 

test molecule or mixture. Magnetic polystyrene or 

3 5 agarose beads, to which streptavidin is covalently 
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attached or attached through a covalently attached 
biotin, can be exposed to the solution for a brief 
period, then removed by use, respectively, of magnets 
or a filter mesh. Magnetic streptavidinated beads are 
5 currently the method of choice. Streptavidin has been, 
used in many of these experiments, but avidin is 
equally useful. 

An example of a second method for the removal of 
unbound DNA is to attach streptavidin to a filter by 

10 first linking biotin to the filter, binding streptavi- 
din, then blocking nonspecific protein binding sites on 
the filter with a nonspecific protein such as albumin. 
The mixture is then passed through the filter, unbound 
DNA is captured and the bound DNA passes through the 

15 filter. This method can give high background due to 
partial retention of the DNA: protein complex on the 
filter. 

One convenient method to sequester captured DNA is 
the use of streptavidin-conjugated superparamagnetic 

20 polystyrene beads as described in Example 7. These 
beads are added to the assay mixture to capture the 
unbound DNA. After capture of DNA, the beads can be 
retrieved by placing the reaction tubes in a magnetic 
rack, which sequesters the beads on the reaction 

25 chamber wall while the assay mixture is removed and the 
beads are washed. The captured DNA is then detected 
using one of several DNA detection systems, as de- 
scribed below. 

Alternatively, avidin-coated agarose beads, can be 

30 used. Biotinylated agarose beads (immobilized D- 
biotin, Pierce) are bound to avidin. Avidin, like 
streptavidin, has four binding sites for biotin. One 
of these binding sites is used to bind the avidin to 
the biotin that is coupled to the agarose beads via a 

35 16 atom spacer arm: the other biotin binding sites 
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remain available . The beads are mixed with binding 
mixtures to capture biotinylated DNA (Example 7) . 
Alternative methods (Harlow, et al.) to the bead 
capture methods just described include the following 
5 streptavidinated or avidinated supports: low-protein 
binding filters, or 96-weIl plates • 

B. Capture of DNA ; Protein Complexes . 

The amount of DNA: protein complex remaining in the 

10 assay mixture in the presence of an inhibitory molecule 
can also be determined as a measure of the relative 
effect of the inhibitory molecule. A net decrease in 
the amount of DNA: protein complex in response to a test 
molecule is an indication of the presence of an 

15 inhibitor. DNA molecules that are bound to protein can 
be captured on nitrocellulose filters. Under low salt 
conditions, DNA that is not bound to protein freely 
passes through the filter. Thus, by passing the assay 
mixture rapidly through a nitrocellulose filter, the 

20 DNA: protein complexes and unbound DNA molecules can be 
rapidly separated. This has been accomplished on 
nitrocellulose discs using a vacuum filter apparatus or 
on slot blot or dot blot apparatuses (all of which are 
available from Schleicher and Schuell, Keene, NH) . The 

25 assay mixture is applied to and rapidly passes through 
the wetted nitrocellulose under vacuum conditions. Any 
apparatus employing nitrocellulose filters or other 
filters capable of retaining protein while allowing 
free DNA to pass through the filter would be suitable 

30 for this system. 

C. Detection Systems . 

For either of the above capture methods, the 
amount of DNA that has been captured is quantitated. 
35 The method of quantitation depends on how the DNA has 
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been prepared- If the DNA is radioactively labelled, 
beads can be counted in a scintillation counter, or 
autoradiography can be taken of dried gels or nitrocel- 
lulose filters. The amount of DNA has been quantitated 
5 in the latter case by a densitometer (Molecular 
Dynamics, Sunnyvale, CA) ; alternatively, filters or 
gels containing radiolabeled samples can be quantitated 
using a phosphoimager (Molecular Dynamics) . Further, 
the captured DNA may be detected using a chemilum- 

10 inescent or color imetric detection system. 

Radiolabelling and chemi luminescence (i) are very 
sensitive, allowing the detection of sub-f emtomole 
quantities of oligonucleotide, and (ii) use well- 
established techniques. In the case of chemilumine- 

15 scent detection, protocols have been devised to 
accommodate the requirements of a mass-screening assay. 
Non-isotopic DNA detection techniques have principally 
incorporated alkaline phosphatase as the detectable 
label given the ability of the enzyme to give a high 

20 turnover of substrate to product and the availability 
of substrates that yield chemiluminescent or colored 
products . 



1. Radioactive Labeling . 

25 Many of the experiments described above for 

UL9 DNA: protein-binding studies have made use of radio- 
labelled oligonucleotides. The techniques involved in 
radiolabelling of oligonucleotides have been discussed 
above. A specific activity of 10 8 -10 9 dpm per DNA 

30 is routinely achieved using standard methods (e.g., 
end-labeling the oligonucleotide with adenosine 7- 
[ 32 P]-5' triphosphate and T4 polynucleotide kinase). 
This level of specific activity allows small amounts of 
DNA to be measured either by autoradiography of gels or 
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filters exposed to film or by direct counting of 
samples in scintillation fluid. 



2 . Cheini luminescent Detection . 
5 For chemiluminescent detection, digoxigenin- 

labelled oligonucleotides (Example 1) can be detected 
using the chemiluminescent detection system "SOUTHERN 
LIGHTS," developed by Tropix, Inc. (Bedord, MA). The 
detection system is diagrammed in Figures 11A and 11B. 

10 The technique can be applied to detect DNA that has 
been captured on either beads, filters, or in solution. 

Alkaline phosphatase is coupled to the captured 
DNA without interfering with the capture system. To do 
this several methods, derived from commonly used ELISA 

15 (Harlow, et al . ; Pierce, Rockford IL) techniques, can 
be employed. For example, an antigenic moiety is 
incorporated into the DNA at sites that will not 
interfere with (i) the DNA: protein interaction, (ii) 
the DNA:drug interaction, or (iii) the capture system. 

20 In the UI-9 DNArprotein/biotin system the DNA has been 
end-labelled with digoxigenin-ll-dUTP (dig-dUTP) and 
terminal transferase (Example 1, Figure 4) . After the 
DNA was captured and removed from the DNA: protein 
mixture, an anti-digoxigenin-alkaline phosphatase 

25 conjugated antibody was then reacted (Boehringer 
Mannheim, Indianapolis IN) with the digoxigenin- 
containing oligonucleotide. The antigenic digoxigenin 
moiety was recognized by the antibody-enzyme conjugate. 
The presence of dig-dUTP altered neither the ability of 

30 UL9-C00H protein to bind the oris (SEQ ID NO: 601)- 
containing DNA nor the ability of streptavidin to bind 
the incorporated biotin. 

Captured DNA was detected using the alkaline 
phosphatase-conjugated antibodies to digoxigenin as 

35 follows. One chemiluminescent substrate for alkaline 
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phosphatase is 3-(2'-spiroadamantane)-4-methoxy-4-(3"- 
phosphoryloxy) phenyl-1, 2-dioxetane disodium salt 
(AMPPD) (Example 7)- Dephosphorylation of AMPPD 
results in an unstable compound, which decomposes, 
5 releasing a prolonged, steady emission of light at 477 
nm. Light measurement is very sensitive and can detect 
minute quantities of DNA (e.g., 10 2 -10 3 attomoles) 
(Example 7) . 

Color imetric substrates for the alkaline phospha- 
10 tase system have also been tested. While the colorime- 
tric substrates are useable in the present assay 
system, use of the light emission system is more 
sensitive. 

An alternative to the above biotin capture system 
15 is to use digoxigenin in place of biotin to modify the 
oligonucleotide at a site protected by the DNA-binding 
protein at the assay site: biotin is then used to 
replace the digoxigenin moieties in the above described 
detection system. In this arrangement the anti- 
20 digoxigenin antibody is used to capture the oligonucle- 
otide probe when it is free of bound protein. Strepta- 
vidin conjugated to alkaline phosphatase is then used 
to detect the presence of captured oligonucleotides. 

25 D. Alternative Methods for Detecting Molecules 

that Increase the Affinity of the DNA-Bind- 
inq Protein for its Cocrnate Site . 

In addition to identifying molecules or compounds 

that cause a decreased affinity of the DNA-binding 

3 0 protein for the screening sequence, molecules may be 
identified that increase the affinity of the protein 
for its cognate binding site. In this case, leaving 
the capture system for unbound DNA in contact with the 
assay for increasing amounts of time allows the 

35 establishment of a fixed half-life for the DNArprotein 
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complex (for example, using SEQ ID NO:601/UL9) . In the 
presence of a stabilizing molecule, the half-life, as 
detected by the capture system time points, will be 
shortened • 

5 Using the capture system for DNA: protein complexes 

to detect molecules that increase the affinity of the 
DNA-binding protein for the screening sequence requires 
that an excess of unlabeled oligonucleotide containing 
the UL9 binding site (but not the test sequences) is 

10 added to the assay mixture. This is, in effect, an 
off -rate experiment. In this case, the control sample 
(no test molecules or mixtures added) will show a fixed 
off -rate. For example, samples would be taken at fixed 
intervals after the addition of the unlabeled competi- 

15 tion DNA molecule, applied to nitrocellulose, and a 
decreasing amount of radiolabeled DNA: protein complex 
would be observed) . In the presence of a DNA-binding 
test molecule that enhanced the binding of UL9, the 
off -rate would be decreased (i.e., the amount of 

20 radiolabeled DNA:protein complexes observed would not 
decrease as rapidly at the fixed time points as in the 
control sample) . 

VI. Utility . 

25 A. The Usefulness of Sequence-Specific DNA- 

Bindina Molecules . 

The present invention defines a high through-put 

in vitro screening assay to test large libraries of 

biological or chemical mixtures for the presence of 

30 DNA-binding molecules having sequence binding prefer- 
ence. The assay is also capable of determining the se- 
quence-specificity and relative affinity of known DNA- 
binding molecules or purified unknown DNA-binding mole- 
cules. Sequence-specific DNA-binding molecules are of 

35 particular interest for several reasons, which are 



WO 94/14980 



PCT/US93/12388 



74 

listed here. These reasons, in part, outline the 
rationale for determining the usefulness of DNA-binding 
molecules as therapeutic agents: 

First, for a given DNA: protein interaction, there 
5 are generally several thousands fewer target DNA- 
binding sequences per cell than protein molecules that 
bind to the DNA. Accordingly, even fairly toxic mole- 
cules might be delivered in sufficiently low concentra- 
tion to exert a biological effect by binding to the 

10 target DNA sequences. 

Second, DNA has a relatively more well-defined 
structure compared to RNA or protein. Since the 
general structure of DNA has less tertiary structural 
variation, identifying or designing specific binding 

15 molecules should be easier for DNA than for either RNA 
or protein. Double-stranded DNA is a repeating 
structure of deoxyribonucleotides that stack atop one 
another to form a linear helical structure. In this 
manner, DNA has a regularly repeating "lattice" 

20 structure that makes it particularly amenable to 
molecular modeling refinements and hence, drug design 
and development. 

Third, since many single genes (i*e., genes which 
have only 1 or 2 copies in the cell) are transcribed 

25 into more than one, potentially as many as thousands of 
RNA molecules, each of which may be translated into 
many proteins, targeting any DNA site, whether it is a 
regulatory sequence, non-coding sequence or a coding 
sequence, may require a much lower drug dose than 

30 targeting RNAs or proteins. Proteins (e.g., enzymes, 
receptors, or structural proteins) are currently the 
targets of most therapeutic agents. More recently, RNA 
molecules have become the targets for antisense or 
ribozyme therapeutic molecules. 
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Fourth, blocking the function of a RNA that 
encodes a protein or of the protein itself when that 
protein regulates several cellular genes may have 
detrimental effects: particularly if some of the 
5 regulated genes are important for the survival of the 
cell. However, blocking a DNA-binding site that is 
specific to a single gene regulated by such a protein 
results in reduced toxicity* 

An example situation is HNF-1 binding to Hepatitis 

10 B virus (HBV) : HNF-1 binds an HBV enhancer sequence 
and stimulates transcription of HBV genes (Chang, et 
al . ) . In a normal cell HNF-1 is a nuclear protein that 
appears to be important for the regulation of many 
genes, particularly liver-specific genes (Courtois, et 

15 al.). If molecules were isolated that specifically 
bound to the DNA-binding domain of HNF-1, all of the 
genes regulated by HNF-1 would be down-regulated, 
including both viral and cellular genes * Such a drug 
could be lethal since many of the genes regulated by 

20 HNF-1 may be necessary for liver function* However, 
the asray of the present invention presents tho ability 
to screen for a molecule that could distinguish the 
HNF-1 binding region of the Hepatitis B virus DNA from 
cellular HNF-1 sites by, for example, including 

25 divergent flanking sequences when screening for the 
molecule. Such a molecule would specifically block HBV 
expression without effecting cellular gene expression* 

B. General Applications of the Assay * 
30 General applications of the assay include but are 

not limited to: screening libraries of unknown 
chemicals, either biological or synthetic compounds, 
for sequence-specific DNA-binding molecules, determin- 
ing the sequence-specificity or preference and/ or 
35 relative affinities of DNA-binding molecules, testing 
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of modified derivatives of DNA-binding molecules for 
altered specificity or affinity, using the assay in 
secondary confirmatory or mechanistic experiments, 
using the data generated from the above applications to 
5 refine the predictive capabilities of molecular 
modeling systems, and using the refined molecular 
modeling systems to generate a new "alphabet" of DNA- 
binding subunits that can be polymerized to make novel 
heteropolymers designed de novo to bind specific DNA 
10 target sites, 

1. Mass-Screening of Libraries for the 
Presence of Sequence-Specific DNA-Bind- 
inq Molecules , 

15 Many organizations (e.g., the National 

Institutes of Health, pharmaceutical and chemical 
corporations) have large libraries of chemical or 
biological compounds from synthetic processes or 
fermentation broths or extracts that may contain as yet 

20 unidentified DNA-binding molecules. One utility of the 
assay is to apply the assay system to the mass-screen- 
ing of these libraries of different broths, extracts, 
or mixtures to detect the specific samples that contain 
the DNA-binding molecules. Once the specific mixtures 

25 that contain the DNA-binding molecules have been 
identified, the assay has a further usefulness in 
aiding in the purification of the DNA-binding molecule 
from the crude mixture. As purification schemes are 
applied to the mixture, the assay can be used to test 

30 the fractions for DNA-binding activity. The assay is 
amenable to high throughput (e*g., a 96-well plate 
format automated on robotics equipment such as a 
Beckman Biomek workstation [Beckman, Palo Alto, CA] 
with detection using semi-automated plate-reading 

35 densitometers, luminometers , or phosphoimagers) . 
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The concentration of protein used in mass-screen- 
ing is determined by the sensitivity desired. The 
screening of known compounds, as described in Section 
VLB. 2, is typically performed in protein excess at a 
5 protein concentration high enough to produce 90-95% of 
the DNA bound in DNArprctein complex. The assay is 
very sensitive to discriminatory inhibition at this 
protein concentration. For some mass-screening, it may 
be desirable to operate the assay under higher protein 

10 concentration, thus decreasing the sensitivity of the 
assay so that only fairly high affinity molecules will 
be detected: for example, when screening fermentation 
broths with the intent of identifying high affinity 
binding molecules. The range of sensitivities in the 

15 assay will be determined by the absolute concentration 
of protein used. 

One utility of the method of the present inven- 
tion, under conditions using a relatively insensitive 
system (high [P]:[D3 ratio), is as a screening system 

20 for novel restriction enzymes. In this case, an 
ability to discriminate between slight differences in 
affinity to different seguences may not be necessary or 
desirable. Restriction enzymes have highly discrimina- 
tory recognition properties — the affinity constant of 

25 a restriction endonuclease for its specific recognition 
sequence versus non-specific sequences are orders of 
magnitude different from one another. The assay may be 
used to screen bacterial extracts for the presence of 
novel restriction endonucleases* The 256 test oligonu- 

30 cleotides described in Example 10, for example, may be 
used to screen for novel restriction endonucleases with 
4 bp recognition seguences. The advantages of the 
system are that all possible 4 bp seguences are 
screened simultaneously, that is, it is not limited to 

35 self -complementary sequences. Further, any lack of 
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specificity (such as, more than one binding site) is 
uncovered during the primary screening assay. 



2. Directed Screening . 
5 The assay of the present invention is also 

useful for screening molecules that are currently 
described in the literature as DNA-binding molecules 
but with uncertain DNA-binding sequence specificity 
(i.e., having either no well-defined preference for 

10 binding to specific DNA sequences or having certain 
higher affinity binding sites but without defining the 
relative preference for all possible DNA binding se- 
quences) . The assay can be used to determine the 
specific binding sites for DNA-binding molecules, among 

15 all possible choices of sequence that bind with high, 
low, or moderate affinity to the DNA-binding molecule. 
Actinomycin D, Distamycin A, and Doxorubicin (Example 
6) all provide examples of molecules with these modes 
of binding. Many anti-cancer drugs, such as Doxorubi- 

20 cin (see Example 6) , show binding preference for 
certain identified DNA sequences, although the absolute 
highest and lowest specificity sequences have yet to be 
determined, because, until the invention described 
herein, methods (Salas and Portugal; Cullinane and 

25 Phillips; Phillips; and Phillips, eta!.; for detecting 
differential affinity DNA-binding sites for any drug 
were limited. Doxorubicin is one of the most widely 
used anti-cancer drugs currently available. As shown 
in Example 6, Doxorubicin is known to bind some se- 

30 quences preferentially. Another example of such se- 
quence binding preference is Daunorubicin (Chen, et 
al.) which differs slightly in structure from Doxo- 
rubicin (Goodman, et al.). Both Daunorubicin and 
Doxorubicin are members of the anthracycline antibiotic 

35 family: antibiotics in this family, and their deriva- 
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tives, are among the most important newer antitumor 
agents (Goodman, et al.). 

The assay of the present invention allows the se- 
quence preferences or specificities of DNA-binding 
5 molecules to be determined. The DNA-binding molecules 
for which sequence preference or specificity can be 
determined may include small molecules such as amino- 
acridines and polycyclic hydrocarbons, planar dyes, 
various DNA-binding antibiotics and anticancer drugs, 

10 as well as DNA-binding macromolecules , such as, 
peptides and polymers that bind to nucleic acids (e.g., 
DNA and the derivatized homologs of DNA that bind to 
the DNA helix) . 

The molecules that can be tested in the assay for 

15 sequence preference/ specif icity and relative affinity 
to different DNA sites include both major and minor 
groove binding molecules as well as intercalating and 
non-intercalating DNA binding molecules. 

20 3 • Molecules Derived from Known DNA-bind- 

ing Molecules . 

The assay of the present invention facili- 
tates the identification of different binding activi- 
ties by molecules derived from known DNA-binding mole- 

25 cules. An example of this would be to identify and 
test derivatives of anti-cancer drugs that have DNA- 
binding activity and then test for anti-cancer activity 
through, for example, a battery of assays performed by 
the National Cancer Institute (Bethesda MD) . Further, 

30 the assay of the present invention can be used to test 
derivatives of known anti-cancer agents to examine the 
effect of the modifications on DNA-binding activity and 
specificity. In this manner, the assay may reveal 
activities of anti-cancer agents, and derivatives of 

35 these agents, that facilitate the design of DNA-binding 
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molecules with therapeutic or diagnostic applications 
in different fields, such as antiviral or antimicrobial 
therapeutics. The binding-activity information for any 
DNA-binding molecule, obtained by application of the 
5 present assay, can lead to a better understanding of 
the mode of action of more effective therapeutics. 

4 . Secondary Assays . 

As described above, the assay of the present 
10 invention is used (i) as a screening assay to detect 
novel DNA-binding molecules, or (ii) to determine the 
relative specificity and affinity of known molecules 
(or their derivatives) . The assay may also be used in 
confirmatory studies or studies to elucidate the 
15 binding characteristics of DNA-binding molecules. 
Using the assay as a tool for secondary studies can be 
of significant importance to the design of novel DNA- 
binding molecules with altered or enhanced binding 
specificities and affinities. 

20 

a . ) Confirmatory Studies . 
The assay of the present invention can be used in 
competition studies to confirm and refine the original 
direct binding data obtained from the assay. 

25 The primary screening assay does not provide for 

the direct determination of relative absolute affini- 
ties of test molecules for different test sequences. 
A competition method has been developed that aids in 
the interpretation and confirmation of the primary 

30 screening assay. The competition method also provides 
a means for determining the minimum difference in 
absolute affinities of any test sequences for a given 
test molecule. 

Sequences of interest are tested for their ability 

35 to compete with the test oligonucleotide for binding a 
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test molecule of interest* In this method, DNA mole- 
cules that contain sequences that are high affinity 
binding sites for the DNA-binding test molecule compete 
effectively with the test oligonucleotide for the 
5 binding of the test molecule. DNA molecules that con- 
tain sequences that are low affinity binding sites for 
the test molecules are ineffective competitors. In 
effect, the fold-difference in concentration required 
between a high affinity competitor DNA and a low affin- 

10 ity competitor DNA, where the competitor is required to 
compete with the test oligonucleotide for the binding 
of the DNA-binding test molecule, should be proportion- 
al to the difference in affinity between the two compe- 
titor DNA molecules. 

15 Any test oligonucleotide may be used in the compe- 

tition study. However, in practice, since most second- 
ary screening will be used to examine the putative high 
affinity binding test sequences, the secondary competi- 
tion assay is typically used to test a competitor oli- 

20 gonucleotide which is a putative high affinity test se- 
quence. 

In the competition assay , the assay conditions are 
essentially the same as the conditions used in the 
primary screening assay. The assay components are 

25 mixed, with the exception of the DNA. The mixture 
includes protein, buffer and the DNA-binding test mole- 
cule (control samples lack the test molecule) . A test 
oligonucleotide is labeled (for example, using a radio- 
isotope, although any of the described capture/ detec- 

30 tion systems should be effective in the competition 
study) . The DNA sample, including the radiolabeled 
test oligonucleotide and unlabelled competitor DNA is 
added to the assay mixture. Typically, the competitor 
DNA of interest is added to different reactions over a 

35 range of competitor concentrations. Two controls are 
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commonly run: (i) no DNA binding test molecule added; 
and (ii) test DNA but no competitor DNA added . 

The reactions are incubated for the desired time 
and the DNA; protein complexes separated from free DNA 
5 (i.e., DNA not associated with protein) by passing the 
mixture through nitrocellulose. Other capture systems, 
such as the biotin/streptavidin system discussed in 
Section V, are also effective. The amount of radio- 
labeled test oligonucleotide bound by protein (i.e., 
10 bound to the filter) is indicative of the effect of the 
competitor. 

One example of a competition assay is as follows. 
A test oligonucleotide containing the test sequence 
TTAC ranks as a high affinity binding site for a test 

15 molecule. The TTAC test oligonucleotide is radiola- 
beled and mixed with non-radiolabeled competitor DNAs 
that contain, for example, a putative high affinity 
binding site (the same site, TTAC, is one example) or 
a putative low affinity binding site (e.g., CCCC) . In 

20 the absence of any competing nonlabeled DNA or DNA- 
binding test molecule, the amount of radiolabeled 
DNA:protein complex observed (called r%) is arbitrarily 
established as 100%. The concentration of the protein 
used in this experiment is high enough to bind most of 

25 the radiolabelled test oligonucleotide in the absence 
of test molecules or competing DNA molecules (this is 
essentially the same concentration as used in the 
primary screening assay) . 

The test molecule is added to the reaction at a 

30 concentration sufficient to markedly reduce r%, the 
amount of observed DNA: protein complex. The greater 
the reduction in signal, the more easily competition is 
observed. The amount of competitor DNA needed to 
observe competition is proportional to the amount of 

35 DNA-binding test molecule used; therefore, the amount 
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of test molecule used should be sufficient to reduce r% 
to between approximately 10% to 70%. The effect of an 
effective competitor, such as TTAC, is to cause r% to 
rise towards 100%. 
5 The competition for test molecule binding is 

between the non-labeled competitor DNA and the radiola- 
beled test oligonucleotide. As the competitor DNA 
concentration increases, the test molectule binds to 
the competitor DNA and is effectively removed from 
10 solution. Accordingly, the test molecule is no longer 
able to block the binding of the protein to the 
radiolabeled oligonucleotide. A less effective 
competitor, typically a competitor DNA with low 
affinity for the test molecule, will compete less 
15 effectively for the DNA-binding test molecule, even at 
substantially higher concentrations than the high 
affinity competitor. A completely ineffective competi- 
tor, i.e., one that did not bind the test molecule, 
would not cause the r% value to change, even at high 
20 concentrations of the competitor DNA. 

VJhen a competitor DNA has some affinity for the 
test molecule, competition (r% rising towards 100%) 
would be observed at some competitor DNA concentration. 
The difference in concentration between two competing 
25 DNA sequences to achieve an equivalent r% (e.g., 90%) 
should reflect the relative difference in absolute 
affinity between the two competitor DNA molecules. For 
example, if 5 /iM TTAC is required to achieve a change 
in r% from 50% to 90% in the presence of a test mole- 
3 0 cule and 200 iM CCCC is required to achieve the same 
change in r%, then the fold difference in affinity 
between TTAC and CCCC for the test molecule is 200/5 = 
40-fold. 

In the context of screening distamycin with all 
3 5 possible 256 bp test sequences (Example 10) , the 
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conf irmatory assay can be used (i) to confirm the 
rankings observed in the assay, (ii) to refine the 
rankings among the 5-10 highest ranked binders (which 
show no statistical difference in rank with data from 
5 4 experiments) , and (iii) to resolve perceived discrep- 
ancies in the assay data. All of these goals may be 
accomplished using a competition experiment which 
determines the relative ability of test sequences to 
compete for the binding of distamycin. 

10 The perceived discrepancy in the distamycin 

experiment is as follows: test oligonucleotides scored 
poorly in the assay which were complementary to most of 
the top-ranking test sequence oligonucleotides (Exam- 
ples 10 and 11) . This result was unexpected since it 

15 is unlikely that the affinity of distamycin for binding 
a test site depends on the orientation of the screening 
site to the test site* More likely, the assay detects 
the binding of distamycin when the molecule is bound to 
the test oligonucleotide in one orientation, but fails 

20 to detect the binding of distamycin when the test se- 
quence is in the other orientation. A competition 
study will resolve this question, since the binding of 
distamycin to a competitor sequence will be orienta- 
tion-independent; the competition does not depend on 

25 the mechanism of the assay. 

For the competition experiment, the assay may be 
performed under any conditions suitable for the 
detection of drug binding. When these conditions are 
established, different competitor DNAs are added to the 

3 0 assay system to determine their relative ability to 
compete for drug binding with the radiolabeled test 
oligonucleotide in the assay system. 

The competitor DNAs may be any sequence of 
interest. Several classes of DNA may be tested as 

35 competitor molecules including, but not limited to, the 
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following; genomic DNAs, synthetic DNAs (e.g., 
poly(dA), poly(dl-dC), and other DNA polymers), test 
oligonucleotides of varying sequences, or any molecule 
of interest that is thought to compete for distamycin 
5 binding. 

When using the competition assay to verify the 
results of a 256 oligonucleotide panel screen (like 
Example 10) , the following criteria are useful for 
selecting the competitor test oligonucleotides: 

10 (i) sequences that rank high in the assay but 

which do not have relative binding affinities with 
differences that are statistically significant from 
each other, in order to determine their relative 
affinity with greater precision; 

15 (ii) sequences that are purported by other 

techniques (e.g., footprinting or transcriptional block 
analysis) to be high affinity binding sites, in order 
to compare the results of those techniques with the 
screening assay results; 

20 (iii) sequences that are complementary to test se- 

quences that rank high in the assay, in order to 
determine whether these test sequences are false 
negatives ; and 

(iv) sequences of any rank in the assay, in order 

25 to confirm the assay results ♦ 

Several methods may be used to perform the 
competition study as long as the relative affinities of 
the competing DNA molecules are detectable. One such 
method is described in Example 14. In this example, 

30 the concentration of the assay components (drug, 
protein, and DNA) is held constant relative to those 
used in the original screening assay, but the molar 
ratio of the test oligonucleotide to the competitor 
oligonucleotides is varied. 
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Another method for performing a competition assay 
is to hold the concentrations of protein, drug and 
initial amount of test oligonucleotide constant, then 
add a variable concentration of competitor DNA. In 
5 this design, the protein and drug concentration must be 
sufficiently high to allow the addition of further 
competitor DNA without i) decreasing the amount of 
DNA: protein complex in the absence of drug to a level 
that is unsuitable for detection of DNA: protein 

10 complex, and ii) increasing the amount of DNA:protein 
complex in the presence of drug to a level that is 
unsuitable for the detection of drug binding. The 
window between detectable DNA: protein complex and 
detectable effect of the drug must be wide enough to 

15 determine differences among competitor DNAs. 

In any competition method, it is important that 
the relative concentrations of the competing DNA mole- 
cules are accurately determined. One method for 
accomplishing accurate determination of the relative 

20 concentrations of the DNA molecules is to tracer-label 
competitor molecules to a low specific activity with a 
common radiolabeled primer (Example 14) . In this 
manner, the competitor molecules have the same specific 
activity, but are not sufficiently radioactive (200- 

25 fold less than the test oligonucleotide) to contribute 
to the overall radioactivity in the assay • 

b. ) Secondary Studies to Elucidate 
Binding Characteristics . The studies outlined in 
30 Section VLB. 4. a describe methods of determining some 
of the binding processes of distamycin A. The assay of 
the present invention may also be used to explore 
mechanistic questions about distamycin binding „ 

For example, several of the complements of the 
35 putative high affinity binding sites for distamycin 
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have low scores in the assay. As described above, this 
may imply directionality in binding. The results may 
also imply that the test sites are not equal with 
respect to the effect exerted on UL9-COOH binding . 
5 Oligonucleotides can be designed to test the hypothesis 
of directionality. 

The basic test oligonucleotide has the structure 
presented in Figure 27A (SEQ ID NO: 621) . In one 
scenario, the score in the binding assay is high, i.e., 

10 the greatest effect of distamycin, when the test se- 
quences is XYZZ (Figure 27A, with the base X complemen- 
tary to the base Y and the base Q complementary to the 
base 2) , and the complement (Figure 27B; SEQ ID NO: 622) 
scores low. These results imply that the test sites 

15 are not equivalent with respect to their effect on UL9, 
otherwise the right side would have the effect in one 
oligonucleotide and the left site would have the effect 
in the other. These results further suggest that the 
effect of distamycin is directional. The only assump- 

20 tion is that distamycin should bind with the same 
affinity to the XYZZ/QQXY sequence (Figures 27A and 
27B) regardless of its position or orientation in the 
oligonucleotide. Since the scores are derived at 
equilibrium, this is likely to be the case. 

25 To test the hypothesis that one site is effective 

in the assay, oligonucleotides may be designed that 
have the UL9 site inverted with respect to the test 
sites (Figures 27C and 27D; SEQ ID NO: 623 and SEQ ID 
NO: 624 , respectively) . If only one site is active with 

30 respect to UL9 and if the Figure 27A oligo was most 
effective in binding distamycin, then the oligo C 
should be less active in the assay then oligo D; in 
other words, flipping the UL9 site will result in QQXY 
ranking high, XYZZ ranking low. 
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10 



20 



25 



30 



Finally, to determine the "direction" of distamy- 
cin binding, mix test sequences and invert the binding 
site as shown in the four oligonucleotides presented in 
Figures 27E, 27F, 27G and 27H. Alternatively, one test 
site or the other could be deleted from the test oligo- 
nucleotide. 

This type of analysis provides an example of the 
usefulness in tha assay in determining binding proper- 
ties of DNA-binding drugs. 



C,) Restriction EndoTmm» q 5es ag m^^.. 
Proteins in thP » q „ wy . other DNA:protein interactions 
that are useful as screening sequences and their 
cognate binding proteins (indicator proteins) are 
15 restriction enzymes. Such secondary screening assays 
are performed using the same criteria to establish 
conditions for the primary screening assay (described 
in Example 4) . The assay conditions can be varied to 
accommodate different DNA:protein interactions, as long 
as the assay system follows the functional criteria 
discussed above (Section I) . 

One limitation of using restriction endonucleases 
xn the method of the present invention is that the 
assay buffer should not contain divalent cations, m 
the absence of divalent cations, the enzymes will bind 
the appropriate recognition sequence, but not cleave 
the DNA. in the presence of divalent cations, the tes- 
oligonucleotide can be cleaved at or near the protein 
binding site. 

By using different indicator proteins, a different 
recognition sequence can be used to flank the test 
site. This variation allows the resolution of ques- 
tions regarding the potential binding of a test mole- 
cule to a site internal to any single screening se- 
35 quence. For example, the assay system is used where 
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the UL9 protein and its recognition sequence are used 
as the indicator protein: screening sequence interac- 
tion. In this system, if the highest affinity binding 
site for a test molecule is TTAC, then several test se- 
5 quences may be predicted to rank high in the assay 
system: several of these test sequences are presented 
in Figure 31. In Figure 31, the test site is shown in 
bold, the potential binding site for the test molecule 
is shown underlined. 

10 One test oligonucleotide on which the DNA-binding 

test molecule would be predicted to have a high level 
of effect is the oligonucleotide containing the test 
site, TTAC (Figure 31) . However, since the UL9 
recognition sequence contains the sequence TT, flanking 

15 the test site, several other test oligonucleotides 
might also be expected to have high activity in the 
assay (see Figure 31) . 

By using a different DNA: protein interaction as 
the indicator system in a secondary screening assay, 

20 the "false positives" shown for TACN and ACNN (shown in 
Figure 31) can be identified. The recognition sequence 
for the protein in a secondary screening assay simply 
needs to have a different screening sequence in the 
region flanking the test site than the UL9 screening 

25 sequence. 

Restriction endonucleases provide an entire class 
of different DNA: protein interactions with a wide array 
of available sequences that can be used in this manner. 
For example, Smal recognizes the sequence 5 9 -CCCGGG-3 ' * 

30 Using the SmaJ:DNA interaction and the same test se- 
quences presented in Figure 31, the resulting test oli- 
gonucleotides would have the test sequences presented 
in Figure 32. As can be seen from a comparison of 
Figures 31 and 32, changing the screening sequence from 

35 the UL9-binding sequence to the SjnaJ-binding sequence 
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eliminates the potential test molecule binding sites 
internal to the screening sequence (e.g., compare TACN 
and ACNN in the figures) . 

The use of different DNA-binding proteins as 
5 indicator proteins in the assay is also applicable to 
the PCR-based test oligonucleotide selection technology 
(Section III) . 

5. Generation of Binding Data and Refine- 
10 ment of Molecular Modeling Systems , 

The assay of the present invention generates 
data which can be applied to the refinement of molecu- 
lar modeling systems that address DNA structural 
analysis: the data is also useful in the design and/or 

15 refinement of DNA-binding drugs. Traditionally, mass 
screening has been the only reasonable method for 
discovering new drugs. Modern rational drug design 
seeks to minimi2e laboratory screening. However, ab 
initio rational drug design is difficult at this time 

2 0 given (i) insufficiencies in the underlying theories 
used for de novo design, and (ii) the computational 
intensity which accompanies such design approaches* 

The ab initio approach requires calculations from 
first principles by quantum mechanics: such an 

25 approach is expensive and time-consuming. The intro- 
duction of data concerning the relative binding 
affinities of one or more DNA-binding molecules to all 
256 four base pair DNA sequences allows the develop- 
ment, via molecular modeling, of ad hoc protocols for 

30 DNA structural analysis and subsequent DNA-binding drug 
design. The accumulation of data for the DNA sequences 
to which small molecules bind is likely to result, in 
more accurate, less expensive molecular modeling 
programs for the analysis of DNA. 
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The screening capacity of the assay of the present 
invention is much greater than screening a single DNA 
sequence with an individual cognate DNA-binding 
protein. Direct competition assays involving individu- 
5 al receptor: ligand complexes (e.g., a specific DNA : pro- 
tein complex) are most commonly used for mass screening 
efforts. Each such assay requires the identification, 
isolation, purification, and production of the assay 
. components. In particular, a suitable DNA .-protein 

10 interactions must be identified for each selected 
screening sequence. Using the assay of the present 
invention, libraries of synthetic chemicals or biologi- 
cal molecules can be screened to detect molecules that 
have preferential binding to virtually any specified 

15 DNA sequence — all using a single assay system. When 
employing the assay of the present invention, secondary 
screens involving the specific DNA: protein interaction 
may not be necessary, since inhibitory molecules 
detected in the assay may be tested directly on a 

20 biological system: for example, the ability to disrupt 
viral replication in a tissue culture or animal model. 

6. The Design of New DNA-Bindina Hetero- 
polymers Comprised of Subunits Directed 
25 to Different DNA Sequences . 

The assay of the present invention will 

facilitate the predictive abilities of molecular 

modeling systems in two ways. First, ad hoc methods of 

structural prediction will be improved. Second, by 

3 0 employing pattern matching schemes, the comparison of 
sequences having similar or different affinities for a 
given set of DNA-binding molecules should empirically 
reveal sets of sequences that have similar structures 
(see Section VI. D, Using a Test Matrix). Molecular 

35 modeling programs are "trained" using the information 
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concerning DNA-binding molecules and their preferred 
binding sequences. With this information coupled to 
the predicative power of molecular modeling programs , 
the design of DNA-binding molecules (subunits) that 
5 could be covalently linked becomes feasible. 

These molecular subunits would be directed at 
defined sections of DNA. For example, a subunit would 
be designed for each possible DNA unit. For example, 
if single bases were the binding target of the sub- 

10 units, then four subunits would be required, one to 
correspond to each base pair. These subunits could 
then be linked together to form a DNA-binding polymer, 
where the DNA binding preference of the polymer 
corresponds to the sequence binding preferences of the 

15 subunits in the particular order in which the subunits 
are assembled. 

Another example of such a polymer is using 
subunits whose binding was directed at two base 
sections of DNA. In this case, 4 2 = 16 subunits would 

20 be used, each subunit having a binding affinity for a 
specific two base pair sequence (e.g., AA, AC, AG, AT, 
CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT) . If 
the polymers were to be comprised of subunits targeted 
to 3 base pair sections of DNA, then 4 3 = 64 subunits 

25 would be prepared. The design of such molecular 
subunits is dependent upon the establishment of a 
refined database using empirical data derived by the 
method of the present invention, as described in 
Section VLB. 

30 

C. Sequences Targeted by the Assay . 
The DNA: protein assay of the present invention has 
been designed to screen for compounds that bind a full 
range of DNA sequences that vary in length as well as 
35 complexity. Sequence-specific DNA-binding molecules 
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discovered by the assay have potential usefulness as 
either molecular reagents, therapeutics, or therapeutic 
precursors. Sequence-specific DNA-binding molecules 
are potentially powerful therapeutics for essentially 
5 any disease or condition that in some way involves DNA. 
Examples of test sequences for the assay include: a) 
binding sequences of factors involved in the mainte- 
nance or propagation of infectious agents, especially 
viruses, bacteria, yeast and other fungi, b) sequences 

10 causing the inappropriate expression of certain 
cellular genes, and c) sequences involved in the 
replication of rapidly growing cells. Furthermore, 
gene expression or replication need not necessarily be 
disrupted by blocking the binding of specific proteins. 

15 Specific sequences within protein-coding regions of 
genes (e.g., oncogenes) are equally valid test se- 
quences since the binding of small molecules to these 
sequences is likely to perturb the transcription and/or 
replication of the region. Finally, any molecules that 

20 bind DNA with some sequence specificity, that is, not 
just to one particular test sequence, may be still be 
useful as anti-cancer agents. Several small molecules 
with some sequence preference are already in use as 
anticancer therapeutics. Molecules identified by the 

25 present assay may be particularly valuable as lead 
compounds for the development of congeners having 
either different specificity cr different affinity. 

One advantage of the present invention is that the 
assay is capable of screening for binding activity 

30 directed against any DNA sequence. Such sequences can 
be medically significant target sequences scrambled or 
randomly generated DNA sequences, or well-defined, 
ordered sets of DNA sequences. Other sets could be 
used for screening for molecules demonstrating sequence 

35 preferential binding (like Doxorubicin) to determine 
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the sequences with highest binding affinity and/or to 
determine the relative affinities between a large 
number of different sequences. There is usefulness in 
taking either approach for detecting and/or designing 
5 new therapeutic agents* Section VI. C. 3, "Theoretical 
Considerations for Choosing Target Sequences", outlines 
the theoretical considerations for choosing DNA target 
sites in a biological system, 

10 l. Medically Significant Target Sequences . 

Few effective viral therapeutics are current- 
ly available; yet several potential target sequences 
for antiviral DNA-binding drugs have been well-charac- 
terized • Furthermore, with the accumulation of se- 

15 quence data on all biological systems, including viral 
genomes, cellular genomes, pathogen genomes (bacteria, 
fungi, eukaryotic parasites, etc*)/ the number of 
target sites for DNA-binding drugs will increase 
greatly in the future. 

20 There are numerous methods for identifying 

medically significant target sequences for DNA-binding 
drugs, including, but not limited to, the following. 
First, medically significant target sequences are found 
in pathogens of the biological kingdoms, for example in 

25 genetic sequences that are key to biochemical pathways 
or physiological processes. Second, a target is 
identified, such as (i) a pathogen involved in an 
infectious disease, or (ii) a biochemical pathway or 
physiological process of a noninfectious disease, 

30 genetic condition, or ether biological process. Then 
specific genes important for the survival of the 
pathogen or modulation of the endogenous pathway 
involved in the target system are identified. Third, 
specific target sequences are identified that affect 
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the expression or activity of a DNA molecule, such as 
genes or sites involved in replication. 

There are numerous pathogens that are potential 
targets for DNA-binding drugs designed using the 
5 methods described in this application. Table I lists 
a number of potential target pathogens. 



Table I: Pathogens 



VIRUSES 



10 



Retroviruses 
Human 

HIV I ,11 
HTLV I, II 



15 



20 



Animal 

SIV 

STLV I 
FELV 
FIV 
BLV 

BIV (Bovine immunodeficiency virus) 
Lentiviruses 

Avian reticuloendotheliosis virus 



Animal - continued 
25 SIV 

STLV I 
FELV 
FIV 
BLV 

30 BIV (Bovine immunodeficiency virus) 

Lentiviruses 

Avian reticuloendotheliosis virus 
Avian sarcoma and leukosis viruses 
Caprine arthritis-encephalitis 
35 Equine infectious anemia virus 

Maedi/visna of sheep 
MMTV (mouse mammary tumor virus) 
Progressive pneumonia virus of sheep 



40 



Herpesviridae 
Human 



EBV 
CMV 
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25 
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35 



40 



HSV I, II 
VZV 
HH6 

Cercopthecine Herpes virus ( B Virus) 

taLEf" n ° nkeyS With inaction into 

Animal 

Bovine Mammillitis virus 
Equine Herpes virus 
Equine coital exanthema virus 
Equine rhinopneumonitis virus 
infectious bovine rhinotracheitis virus 
Marek's disease virus of fowl 
Turkey herpesvirus 

Hepadnaviruses " ' — 

Human 

HBV/HDV 
Animal 

Duck Hepatitis 
Woodchucks 
Squirrels 



Poxviridae 
Human 

Orf virus 
Cow Pox 
Variola virus 
Vaccinia 
Small Pox 

Pseudocowpox 



Poxviridae - continued 
Animal 

Bovine papular stomatitis virus 
Cowpox virus virus 

Ectromelia virus (mouse pox) 
FowTpSx VirUS6S ° f rabb its/squirrels 

MySa Skin diS6aSe ° f Cat "* virus 
Pseudocowpox virus 
Sheep pox virus 
Swine pox 
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10 



15 



20 



25 



30 



35 



40 



Papovaviridae " ~~ *™ 

Human 

BK virus 
SV-4 0 
JC virus 

?ie?S S f aPlllOInaVirUSeS < see l^t 

Animal 

Lymphotropic papovavirus (LPV) Monkev 
Bovine papillomavirus ' Monkey 

Shope papillomavirus 

Adenoviridae 
Human 

Adenoviruses 1-4 
Animal 

Canine adenoviru ses 2 

Parvoviridae 
Human 

AAV (Adeno Associated Virus) 
B19 (human) ' 

Animal 

FPV (Feline parvovirus) 
PPV (Porcine parvovirus) 
ADV (Aleutian disease, mink) 

Bovine Parvovirus 
Canine Parvovirus 
Feline panleukopenia virus 
Minute virus of mice 
Mink enteritis virus 




Group A Streptococci 

Agents responsible for: 

Streptococcal pharyngitis 
Cervical adenitis 

Otitis media 

Mastoiditis 

Peritonsillar abscesses 
Meningitis 
Peritonitis 
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Pneumonia 

Acute glomerulonephritis 
Rheumatic fever 
Erythema nodosum 



10 



Sta phyl ococcns 
aureus 
epidermidis 
saprophyticus 
cohnii 

haemolytilcus 

xylosus 

warneri 



15 



20 



25 



30 



capitis 

hominis 

silmulans 

saccharolyticus 

auricularis 

Agents responsible for: 
FuruncJcles 
Carbuncles 
Osteomyelitis 
Deep tissue abscesses 
Wound infections 
Pneumonia 
Empyema 
Pericarditis 
Endocarditis 
Meningitis 
Purulent arthritis 
Enterotoxin in food poisoning 



Branhamslla catarrhalis 



35 



Neisseria 

gonorrhoea 
lactamica 
sicca 
sub f lava 
mucosa 



40 



Neisseria - continued 
f lavescens 
cinerea 
elongata 
canis 

meningitides 
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Enteric Bacilli and Similar Gram-Negative Bac- 
teria 

Escherichia 

Proteus 

Klebsiella 

Pseudomonas aeruginosa 
En terobacter 
Citrobacter 
Proteus 



Providencia 
Bacteroides 
Serratia 

Pseudomonas (not aeruginosa) 
Acinetobacter 
Salmonella 
Shigella 
Aeromonas 
Moraxella 
Edwardsiella 
Ewingella 
Hafnia 
Kluyvera 
Morganella 
Plesiomonas 

Pseudomonas 

aeruginosa 
putida 
pseudomallei 
mallei 



Haemophilus 
ducreyi 
influenzae 

parainfl uenzae 



35 



Bordetella pertussis 



Yersinia 

pestis (plague) 

pseudotuberculosis 

enterocolitica 



40 



Francisella tularensis 



Pasteurella multocida 



Vibrio 

cholerae 
parhaemolyticus 
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fluvialis 
furnissii 
mimicus 



Brucella 

melitensis 
abortus 
suis 
canis 



Bartonella bacilliformis 



10 



Gardnerella vaginalis 



15 



Borrelia 

recurrentis 
hermsii 
duttoni 
crocidurae 

burgdorferi (Lyme disease) 



20 



25 



30 



Bacillus 

anthracis 

cereus 

megaterium 

subtilis 

sphaericus 

circulans 

brevis 

lentiformis 

macerans 

pumilus 

thuringiensis 

larvae 

lentimorbus 

popilliae 



Streptobacillus moniliformis (rat bite fever) 



Spirillum minus (rat bite fever) 



Rothia dentocariosa 



35 



Kurthia 



Clostridium 

botulinum 
nouyi 

bifermentans 



WO 94/14980 



PCTAUS93/12388 



101 



Clostridium - continued 
histolyticum 
ramosum 
tetani 
perfringens 
novyi 

septicum 



Campylobacter 
jejuni 
10 fetus 

hyintestinalis 
fennelliae 
cinaedi 



Corynebacterium 
15 ulcer ans 

pseudotuberculosis 
JK 

dipktheriae 



Legionella 
20 pneumophila 
bosemanii 
micdadie 
bosenamii 
feleii 

25 many others 



Mycobacterium 

tuberculosis 
africanum 
bovis 

30 leprae 

avium complex 
kansasii 

fortuitum complex 
scrofulaceum 
35 marinum 
ulcerans 



Actinomyces 



Bacteroides 

fragiligis 



40 Fusobacterium 

necrophorum 
nucleatum 



Peptostreptococcus 
Arachnia 
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Bifidobacterium 



Propionibacterium 



Nocardia 



Treponema pallidum (syphilis) 



10 



15 



20 



Rickettsiae 
Typhus 

R. prowazeki (epidemic) 

R. prowazeki (Brill's disease) 

R. typhi (endemic) 
Spotted fever 

R. rickettsi 

R. sibiricus 

R. conorii 

R , australis 

R . akari 
Scrub typhus 

R . tsutsugamushi 
Q fever 

Coxiella burnetii 
Trench fever 

Rochalimaea quintana 



25 



30 



Chlamydiae 

C* trachomatis 

(blindness, pelvic inflammatory dis- 
ease , LGV) 



Mycoplasma 

pneumoniae 

Urea pi as ma urealyti cum 



Cardiobacterium hominis 



Actinobacillus actinomycetemcomitans 



Kingella 



Capnocytophaga 



Pasteurella multocida 



Leptospira interrogans 



35 



Listeria monocytogenes 



Erysipelothrix rhusiopthiae 



Streptobacillus moniliformis 



Calymmatobacterium granulomatis 



Bartonella bacilliformis 
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Francisella tularensis 



Salmonella typhi 



FUNGAL 



10 



Actinomyces 

israelii 

naeslundii 

viscosus 

odontolyticus 

meyeri 
pyogenes 



Cryptococcus neoformans 



Blastomyces dermatitidis 



Histoplasma capsulatum 



Coccidioides immitis 



15 



Paracoccidioides brasiliensis 



20 



Candida 

albicans 
tropicalis 

(Torulopsis) glabrata 
para psi losis 



25 



Aspergillus 

fumigatus 
flavus 
niger 
terreus 



Rhinosporidiosis seeberi 



Phycomycetes 



Sporothrix schenickii 



30 



Mucorales 



Entomophthorales 



35 



Agents of chromoblastomycosis 



Micros porum 

M. audouilni (ring worm) 
M . canis 

M. gypseum __ 
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Trichophyton 

T. schoenleinii (f avus-ringworra) 

T. violaceum (hair) 

T. tonsurans (hair) 

T. mentagrophytes (athlete's foot) 
r. rubrum (athlete's foot) 

Malassezia furfur 

Cladosporium 

werneckii 
carrioni 



Fonsecaea 

pedrosoi 
compacta 



Phialophora verrucosa 



Rhinocladiella aquaspersa 



Trichosporon cutaneum 



Piedraia hortai 



Ascomycota 



Basi diomycota 



Deu teromycota 



Norcardia 

brasiliensis 

caviae 

asteroides 



PARASITIC PATHOGENS 



Plasmodium (malaria) 
falcilparum 
vivax 
ovale 
malar iae 



Schistosoma 

japonmicum 
mansoni 
haematobium 
intercalatum 
mekongi 
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Trypanosoma 

Jbrucei gambiense 
brucei rhodesiense 
evansi 
cruzi 

equiperdum 
congolense 



Entamoeba histolytica 



Naegleria fowleri 



10 Acanthoamoeba 

astronyxis 
castellanii 
culbertsoni 
hatchetti 
15 palestinensis 
polyphaga 
rhyusodes 



Leishmania 

dovonani 
20 infantum 
chagasi 
topica 
major 
aethiopica 
25 mexicana 

braziliensis 
peruviana 



Pneumocystis carinii (interstitial pneumonia) 

Babesia (tick born hemoprotozoan) 
30 microti 

divergens __ 

Giardia lamblia ^ 

Trichomonas (venereal disease) 
vaginalis 
35 hominis 

tenax 

Cryptosporidium paxrvum (intestinal protozoan) 

Isopora belli (dysentery) 

Balantidium coli (protozoon induced dysentery) 

40 Dientamoeba fragilis 



Blastocystis hominis 



WO 94/14980 



PCTAJS93/12388 



106 



Trichinella spiralis (parasitic nematode) 

Wuchereria bancrofti (lymphatic filariasis) 

Brugia (lymphatic filariasis) 
malayi 

timori 

Loa loa (eye worm) 

Onchocerca volvulus 

Mansonella 

perstans 
10 ozzardi 

streptocerca 

Dirofilaria immitis 

Angiostrongylus cantonensis 
cos t ari censi s 
15 malayensis 

mackerrasae 

Anisaki s ( nematode ) 
simplex 

typica 

20 Pseudoterranova decipiens 

Gnathostoma spinigerum 

Dracunculus medinensis (filarial parasite, gui- 
nea worm) 

Trichuris trichiura (whip worm) 

25 Ascaris lumbricoides (nematode) 

Toxocara canis (nematode round worms) 

Necator americanus (heart worm) 

Ancylostoma (hook worm) 
duodenale 
30 ceylanicum 

americanus 

members of the species Trichostrongylus 

Strongyloides (intestinal nematode) 
stercoralis 

35 I jfueliejborni m _ 

Capillaria philippinensis (intestinal nematode) 
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Various species of Paragonimus (lung fluke dis- 
ease) 



Various species of Micorsporida 



Clonorchis sinensis (liver fluke) 



Fasciola (trematode, intestinal worm) 
hepatica 

gigantica 



Fasciolopsis buski 



Heterophyes heterophyes 



10 



Metagonimus yakagawa 



Taenia 

saginata (beef tapeworm) 
solium (pork tapeworm) 



15 



Hymenolepis (dwarf tapeworm) 
nana 

nana fraterna 
diminuta 



Dipylidium caninum (tapeworm of dogs and cats) 



20 



Diphyllobothrium (fish tapeworms) 
lantum 
dalliae 
nihonkaiense 
pacificum 



Echinococcus (tape worm 


with 


cysts) 


granulosus 






multilocularis 






vogeli 






Enterobius vermicularis 


(Pin 


worm) 



25 



30 



In addition to pathogens, many non-infectious 
diseases may be controlled at the level of DNA. These 
diseases are therefore potential candidates for 
treatment with DNA-binding therapeutics that are 
35 discovered or designed using the methods described in 
this application. Table II lists a number of potential 
non-infectious diseases that may be targeted for 
treatment using DNA-binding molecules. 
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Table II: Noninfectious Diseases 



CANCER 



10 



15 



20 



25 



30 



35 



Lung 

Adenocarcinoma 
Squamous cell 
Small cell 



Breast carcinoma 



Ovarian 

Serous tumors 
Mucinous tumors 
Endometrioid carcinoma 



Endometrial carcinoma 



Colon carcinoma 



Malignant Melanoma 



Prostate carcinoma 



Lymphoma 
Hodgkins 
Non-Hodgkin's 



Leukemia 

Chronic Myelogenous 
Acute Myelogenous 
Chronic Lymphocytic 
Acute Lymphocytic 



Cervical carcinoma 



Seminoma 



Multiple Myeloma 



Bladder carcinoma 



Pancreatic carcinoma 



Stomach carcinoma 



Thyroid 

Papillary adenocarcinoma 
Follicular carcinoma 
Medullary carcinoma 



Oral & Pharyngeal carcinomas 



Laryngeal carcinoma 



Bladder carcinoma 
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10 



15 



20 



25 



30 



Renal cell carcinoma 



Hepatocellular carcinoma 



Glioblastoma 



Astrocytoma 



Meningioma 



Osteosarcoma 



Pheochromocytoma 



CARDIOVASCULAR DISEASES 



Hypertension 
Essential 
Malignant 



Acute Myocardial Infarction 



Stroke 

Ischemic 
Hemorrhagic 



Angina Pectoris 



Unstable angina 



Congestive Heart Failure 



Supraventricular arrhythmias 



Ventricular arrhythmias 



Deep Venous Thrombosis 



Pulmonary Embolism 



Hypercholesterolemia 



Cardiomyopathy 



Hyper tr ig ly cer idemia 



RESPIRATORY DISORDERS 



Allergic rhinitis 



Asthma 



Emphysema 



Chronic bronchitis 



Cystic Fibrosis 



Pneumoconiosis 
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Respiratory distress syndrome 



Idiopathic pulmonary fibrosis 



Primary pulmonary hypertension 



GASTROINTESTINAL: DISORDERS 



Peptic ulcers 



Cholelithiasis 



Ulcerative colitis 



Crohn's disease 



Irritable Bovel Syndrome 



Gastritis 



Gilbert ' s syndrome 



Nausea 



ENDOCRINE / METABOLIC DISORDERS 



15 



20 



25 



30 



Diabetes mellitus type I 



Diabetes mellitus type II 



Diabetes ins ipidus 



Hypothyroidism 



Hypert hy r o id i sm 



Gout 



Wilson's disease 



Addison's disease 



Cushing's syndrome 



Acromegaly 



Dwarfism 



Prolactinemia 



Morbid obesity 



Hyp erparathyro id i sm 



Hypopar athy r o id i sm 
Osteomalacia 
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RHEUMATOLOGY/ IMMUKOLOGY DISORDERS 



Transplant rejection 



Systemic lupus erythematosus 



Rheumatoid arthritis 



Temporal Arteritis 



Amyloidosis 



Sarcoidosis 



Sjogren's Syndrome 



Scleroderma 



Ankylosing spondylitis 



Polymyositis 



Re iter 's Syndrome 



Polyarteritis nodosa 



Kawasaki's disease 



HEMATOLOGIC DISORDERS 



Anemia 

Sickle cell 
Sideroblastic 
Hereditary spherocytosis 
Aplastic 

Autoimmune hemolytic anemia 



Thalassemia 



Disseminated intravascular coagulation 



Polycythemia vera 



Thrombocytopenia 

Thrombotic thrombocytopenic purpura 
Idiopathic thrombocytopenic purpura 



Hemophilia 



von Willebrand's disease 



Neutropenia 

Post-chemotherapy 
Post-radiation 
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NEUROLOGIC DISORDERS 



Alzheimer's disease 



Parkinson's disease 



Myasthenia gravis 



Mult iple sclerosis 



Amyotrophic lateral sclerosis 



Epilepsy 



Headaches 
Migraine 
Cluster 
Tension 



Guillain-Barre syndrome 



Pain (post-op , trauma) 



Vertigo 



PSYCHIATRIC DISORDERS 



Anxiety 



Depression 



Schizophrenia 



Substance abuse 



Manic-Depression 



Anorexia 



DERMATOLOGIC DISORDERS 



Acne 



Psoriasis 



Eczema 



Contact dermatitis 



Pruritis 



OPHTHALMIC DISORDERS 



Glaucoma 



Allergic conjunctivitis 



Macular degeneration 
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| MUSCULOSKELETAL DISORDERS J 

Osteoporosis 

Muscular dystrophy 

Osteoarthritis 

5 GENETIC DISORDERS : 

Down's syndrome 

Marfan' s syndrome 

Neurofibromatosis 

Tay-Sachs disease 

10 Gaucher 's disease . 

Niemann-Pic3c disease ^z^—— 

GENITAL-URINARY DISORDERS } 

Benign prostatic hypertrophy 

Polycystic kidney disease 

15 Non-infectious glomerulonephritis 

Goodpasture's syndrome 

Urolithiasis 

Endometriosis 

Impotence 

20 Infertility 

Fertility control 

Menopause _____ 

Once a disease or condition is identified as a 
25 potential candidate for treatment with a DNA-binding 
therapeutic, specific genes or other DNA sequences that 
are crucial for the expression of the disease associat- 
ed gene (or survival of a pathogen) are identified 
within the biochemical or physiological pathway (or the 
30 pathogen) . In humans, many genes involved in important 
biological functions have been identified. Virtually 
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any DNA sequence is a potential target site for a DNA- 
binding molecule, including mRNA coding sequences, 
promoter sequences, origins of replication, and 
structural sequences, such as telomeres and centro- 
5 meres. One class of sites that may be preferable are 
the recognition sequences for proteins that are 
involved in the regulation or expression of genetic 
material. For this reason, the promcter/ regulatory 
regions of genes also provide potential target sites 
10 (Table III, see also Example 15). 



Table III: Human Genes with Promoter Regions that 
are Potential Targets for DNA-Binding Molecules 

1 * [ LOCUS Names are from EMBL database ver. 33. 1992 •] | 


LOCUS 
Names* 


Locus Description j 


>HS5FDX 


Human ferredoxin gene, 5' end. 


>HSA1ATCA 


Human macrophage alphal-antitrypsin 
cap site region 


>HSA1GPB1 


Human gene B for alpha 1-acid glyco- 
protein exon 1 and 5 'flank 


>HSA1MBG1 


Human gene for alpha-l-micro-globu- 
lin-bikunin, exons 1-5 (encoding 


>HSA2MGLB1 


H. sapiens gene for alpha-2 macro- 
globulin, exon 1 


>HSACAA1 


H. sapiens ACAA gene (exons 1 & 2) 
for peroxisomal 3-oxoacyl-CoA 


>HSACCOA 


Homo sapiens choline acetyltrans- 
f erase gene sequence. 


>HSACEB 


Human angiotensin 1-converting en- 
zyme (ACE) gene, 5' flank. 


>HSACHG1 


Human gene fragment for the acetyl- 
choline receptor gamma subunit 


>HSACT2CK1 


Human cytokine (Act-2) gene, exon 1. 


>HSACTBPR 


Human beta-actin gene 5 '-flanking 
region 
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>HSACTCA 


Human cardiac act in gene, 5' flank* 


>HSACTSA 


Human gene for vascular smooth mus- 
cle alpha-actin (ACTS A) , 5' 


>HSACTSGi 


Human enteric smooth muscle 
gamma-actin gene, exon 1, 


>HSAD12L 


Human arachidonate 12 -lipoxygenase 
gene , 5 ' end • 


>HSADH1X 


Human alcohol dehydrogenase alpha 
subunit (ADH1) gene, exon 1* 


>HSADH2X 


Human alcohol dehydrogenase beta 
sujjunjLL. \£\urL^) gene, exon x» 


>HSAFPCP 


Human alpha-f etoprotein gene, com- 
plete cds * 


>HSAK1 


Human cytosolic adenylate kinase 
\AJ\-L) gene, compieLe cols* 


sUCAPJIT 

>noAbAL 


riuman axpna— n— aceuyxgaxacuosamxnx— 
dase (NAGA) gene, complete cds. 




n. sapiens ajjAd gene lor porpnooxxxn— 
ogen synthase 




Unman a 1 Vmitti i ti ftQ o onhanpoi 1 * T"OfYl on 

Human dujuimn y tsi iti t;* nidi i\_^tix. X- tsy xuii* 


>HSALDA1 


Human aldolase A gene 5' non-coding 


>HSALDCG 


Human aldolase C gene for 

!XuwUw*?6 -L , o uxbpuus^nauc axuuiasc 


>HSALD0A 


Human aldolase A gene (EC 4 . 1 . 2 . 13 ) 


>HSALD0BG 


Human DNA for aldolase B transcrip- 
tion start region 


>HSALIFA 


Human leukemia inhibitory factor 
(LIF) gene, complete cds. 


>HSAMIN0N 


Human aminopeptidase N gene, com- 
plete cds* 


>HSAMY2A1 


Human alpha-amylase (EC 3.2.1.1) 
gene AMY 2 A 5-flank and exon 1 


>HSAMYB01 


Human amyloid-beta protein (APP) 
gene, exon 1. 1154 


>HSANFG1 


Human gene fragment for pronatriodi- 
latin precursor (exons 1 and 2) 
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>HSANFPRE 


Human gene for atrial natriuretic 
factor (hANF) precursor 


>HSANF21 


Human atrial natriuretic factor 
gene, complete cds. 


>HSANGG1 


Human angiotensinogen gene 5 'region 
and exon 1 


>HSANT1 


Human heart /skeletal muscle ATP/ADP 
trans locator (ANT1) gene, 


>HSAPC3A 


Human apolipoprotein CIII gene and 
apo Al-apo CIII intergenic 


>HSAPC3G 


Human gene for apolipoprotein C—III 


>HSAP0A2 


Human gene for apolipoprotein All 


>HSAPOAIA 


Human fetal gene for apolipoprotein 
AI precursor 


>HSAPOBPRM 


Human apoB gene 5' regulatory region 
(apolipoprotein B) 


>HSAPOC2G 


Human apoC-II gene for preproapo- 
lipoprotein C-II 


>HSAPOCIA 


Human apolipoprotein C-I (VLDL) 
gene, complete cds. 


>HSAPOLIDG 


H. sapiens promoter region of gene 
for apolipoprotein D 


>HSARG1 


Human arginase gene exon 1 and 
flanking regions (EC 3.5.3.1) 


>HSASG5E 


Human argininosuccinate synthetase 
gene 5 7 end 1105 


>HSATP1A3S 


Human sodium/potassium ATPase alpha 
3 subunit (ATP1 A3) gene, 5 ' 


>HSBSF2 


Human (BSF-2/IL6) gene for B cell 
stimulatory factor-2 


>HSC5GN 


Human C5 gene, 5' end. 650 


>HSCAII 


Human gene fragment for carbonic 
anhydrase II (exons 1 and 2) 


>HSCALCAC 


Human calcitonin/ alpha-CGRP gene 


>HSCALRT1 


Human DNA for calretinin exon 1 


>HSCAPG 


Human cathepsin G gene, complete 
cds. 
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>noCAVlll 


H. sapiens carbonic anhydrase VII (CA 
VII) gene, exon 1. 


>HSCBMYHC 


heavy chain 




flank. >HSCD4 Human recogni- 
tion/surface antigen (CD4) gene, 5' 
end. 


>HSCD44A 


Human hyaluronate receptor (CD44) 
gene, exon 1. 


>HSCFTC 


Human cystic fibrosis transmembrane 
conductance regulator gene, 5' 


>HSCH7AHYR 


Human cholesterol 7-alpha-hydroxyl- 
ase (CYP7) gene, 5' end. 


>HSCHAT 


Human gene for choline acetyltrans- 
f erase (EC 2.3.1.6), partial 


>HSCHYMASE 


Human mast cell chymase gene, com- 
plete cds. 


>HSCHYMB 


Human heart chymase gene, complete 
cds. 3279 


>HSCKBG 


Human gene for creatine kinase B (EC 
2.7.3.2) 


>HSCNP 


Human C-type natriuretic peptide 
gene, complete cds. 


>HSCD59011 


Human transmembrane protein (CD59) 
gene , exon 1 . 


>HSCDPR0 


Human myeloid specific CDllb promot- 
er DNA. 


>HSCETP1 

t 


Human cholesteryl ester transfer 
protein (CETP) gene, exons 1 and 


>HSCFTC 


Human cystic fibrosis transmembrane 
conductance regulator gene, 5 7 


>HSCOSEG 


H. sapiens coseg gene for vasopres- 
sin-neurophysin precursor 


>HSCREKIN 


Human creatine kinase gene, exon 1. 


>HSCRYABA 


Human alpha-B-crystallin gene, 5' 
end. 


>HSCS5P 


Human C3 gene, 5' end. 
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>HSCSF1G1 


Human gene for colony stimulating 

IdCLOr wor x 3 icy luu 


| >HSCSPA 


Human cytotoxic serine proteinase 
gene , cuiupictc (_u^>. 


>HSCST3G 


Human CST3 gene for cystatin C 


>HSCST4 


H» sapiens CST4 gene for cystatin D 


>H5CYP2C8 


Human CYP2C8 gene for cytochrome 
P-450, 5' flank and exon 1 


-^riOv- X IT ft D/^ 


uiiwan rfpne for cholesterol desmola.se 
cytochrome P-450 (SCC) exon 1 




pi iwi^n c-4- pm i H 1 1 -hp.ta-hvdroxvlase 
(CYP11B1) gene, exons 1 and 2. 


SUCPVDYT 


Unman fVPYT rfpnp "fOY* crfpTO id 18— hV— 

droxylase (P-450 C18) . 2114 


>HSCYPXIB1 


Human CYPXIB gene for steroid llbe- 
ta-hydroxylase (P-450 llbeta) , 


>HSCYPXIX 


Human CYPXIX gene, exon 1 coding for 
aromatase P-450 (EC 1.14 .14.1) 


>HSDAFC1 


Human decay-accelerating factor 
(DAF) gene, exons 1 and 2. 


>HSDBH1 


Human DNA for dopamine beta-hydr- 
oxylase exon 1 (EC 1.14.17.1) 


>HSDES 


Human desmin gene, complete cds. 




Human cytokeratin 8 (CK8) gene, com- 
plete cds. 


>HSDNAP0L 


Human DNA polymerase alpha gene, 5' 
end. 


>HSDOPAM 


H. sapiens dopamine D1A receptor 
gene, complete exon 1, and exon 2, 


>HSECP1 


Human DNA for eosinophil cationic 
protein ECP 


>HSEGFA1 


Human HER2 gene, promoter region and 
exon 1. 


>HSEL20 


Human elastin gene, exon 1. 


>HSELAM1B 


Human endothelial leukocyte adhesion 
molecule I (ELAM-l) gene, 
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>HSEMBPA 


Human eosinophil major basic protein 
gene, complete cds. 


>HSENKB1 


Human preproenkephalin B gene 5 ' 
region and exon 1 


>HSEN035 


Human EN03 gene 5' end for muscle — 
specific enolase 


>HSEOSDN 


Human DNA for eosinophil derived 
neurotoxin 




Human ervthroooietin recentor mRNA 
sequence derived from DNA, 5' 




T_Ji tti> ^ r\ r~* _ ov*K 15*5 / noil nv*A+*o i n /tono 

numan c—ero oz/neu prouexn yene, 
5 'end, and promoter region* 


>HSEKCC25 


Human genomic and mRNA sequence for 
ERCC2 gene 5 'region involved in 


>HSERPA 


Human erythropoietin gene, complete 
cds. 


>HSERR 


Human mRNA for oestrogen receptor 


>HSESTEI1 


H. sapiens exon 1 for elastase I 


>HSFBRGG 


Human gene ior ixorinogen gaimna 
chain 


>HSF CEKGd 


numan xympnocyxe xgr, rBceptor gene 
5 '-region (Fc-epsilon R) 


>HSr ERCal 


Human apoiBrritin n gene exun x 


>HSFIBBR1 


Human fibrinogen beta gene 5' region 

cLnci cXOu X 


>HSFIXG 


Human factor IX gene, complete cds. 


>HSFKBP1 


Human FK506 binding proteins 12 A, 
12B and 12C (FKBP12) mRNA, exons 


>HSFLAP1 


Human 5- lipoxygenase activating pro- 
tein (FLAP) gene, exon l. 


>HSFOS 


Human f os proto-oncogene (c-f os) , 
complete cds. 


>HSG0S2PE 


Human G0S2 gene, upstream region and 
cds. 


>HSGCSFG 


Human gene for granulocyte colony — 
stimulating factor (G-CSF) 


>HSGEGR2 


Human EGR2 gene 5' region 1233 
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>HSGHPROM 


Human growth hormone (hGH) gene pro- 
moter 


>HSGIPX1 


Human gastric inhibitory polypeptide 
(GIP) mRNA, exon 1. 


>HSGLA 


Human GLA gene for alpha-D-galacto- 
sidase A (EC 3.2.1.22) 


>HSGLUC1 


Human glucagon gene transcription 
start region 732 


>HSGMCSFG 


Human gene for granulocyte-macro- 
phage colony stimulating factor 


>HSGR1 


Human glucocorticoid receptor gene, 
exon 1. 1602 


>HSGRFP1 


Human growth hormone-releasing fac- 
tor (GRF) gene, exon 1 (complete) 


>HSGSTP15 


Human GST pi gene for glutathione 
S-transf erase pi exon 1 to 5 


>HSGTRH 


Human gene for gonadotropin-relea- 
sing hormone 


>HSGYPC 


Human glycophorin C (GPC) gene, exon 
1, and promoter region. 


>HSH10 


Human histone (H10) gene, 5' flank. 


>HSH1DNA 


Human gene for HI RNA 1057 


>HSH1FNC1 


Human HI histone gene FNC16 promoter 
region 


>HSH2B2H2 


Human H2B.2 and H2A.1 genes for His- 
tone H2A and H2B 


>HSH4AHIS 


H. sapiens H4/a gene for H4 histone 


>HSH4BHIS 


H. sapiens H4/b gene for H4 histone 


>HSHARA 


Human androgen receptor gene, tran- 
scription initiation sites. 


>HSHCG5B1 


Human chorionic gonadotropin (hCG) 
beta subunit gene 5 5' -flank 


>HSHEMPRO 


Human DNA for hemopoxin promoter 


>HSHIAPPA 


Human islet amyloid polypeptide 
(hIAPP) gene, complete cds. 


>HSHIH4 


Human H4 histone gene 
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>HSHISH2A 


Human histone H2a gene 


>HSHISH2B 


Human histone H2b gene 


>HSHISH3 


Human histone H3 gene 




Human HLA— Al aene 


>HSHLAB27 


Human gene for HLA-B27 antigen 


>HSHLABW 


Human HLA-Bw57 gene 


>HSHLAF 


Human HLA-F gene for human leukocyte 
antigen F 


>HSHLIA 


Human aene for histocompatibility 
antigen HLA-A3 




XlUIlla.il y £21 lfci JL L)x uida? X HID uuouiupa Ul 

bility antigen HLA-CW3 


>HSHMG17G 


Human HMG-17 gene for non-histone 
chromosomal protein HMG-17 


>HSHOX3D 


Human HOX3D gene for homeoprotein 
HOX3D 


>HSHSC70 


Human hsc70 gene for 71 kd heat 
shock cognate protein 


>HSHSP70D 


Human heat shock protein (hsp 70) 
gene, complete cds. 


>HSHSP70P 


Human hsp70B gene 5 '-region 


>HSIAPP12 


Human IAPP aene exon 1 and exon 2 
for islet amyloid polypeptide 


>HSTCAMAB 


Human intercellular adhesion mole— 
cule 1 (ICAM-1) gene, exon 1. 


-"*ilOX.r X -J ** 


Hum an t nfprfpron- inducible aene 

nuutaii ^AiwcxxwXwii ajiuuv^x***^ 

IFI-54K 5 'flank 


>HSIFNA14 


Human interferon alpha gene 
IFN-alpha 14 


>HSIFNA16 


Human interferon alpha gene IFN-al- 
pha 16 


>HSIFNA5 


Human interferon alpha gene IFN-al- 
pha 5 


>HSIFNA6 


Human interferon alpha gene IFN-al- 
pha 6 


>HSIFNA7 


Human interferon alpha gene IFN-al- 
pha 7 
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>HSIFNG 


Human immune interferon (IFN-gamma) 
gene • 


>HSIFNIN6 


Human al- 
oha /beta— interferon ( IFN^ — inducible 
6-16 gene exon 1 and 




factor II (IGF-2) ; exon 4B 


>nolbr DrlA 


iiuiiian insuj.ii) iiac growtn iacuur 
binding protein (hIGFBPl) gene 


>HSIGK10 


Human germline gene for the leader 
peptide and variable region 


>HSIGK15 


Human germline gene for the leader 
peptide and variable region 


>HSIGK17 


Human rearranged gene for kappa im- 
munoglobulin subgroup V kappa IV 


>HSIGK20 


Human rearranged DNA for kappa immu- 
noglobulin subgroup V kappa III 


>HSIGKLC1 


Human germline fragment for immuno- 
globulin kappa light chain 


>HSIGVA5 


Human germline immunoglobulin kappa 
light chain V-segment 


>HSIL05 


Human inter leukin-2 (IL-2) gene and 
5 '-flanking region 


>HSIL1AG 


Human gene for interleukin 1 alpha 
(IL-1 alpha) 


>HSIL1B 


Human gene for prointerleukin 1 beta 


>HSIL2RG1 


Human interleukin 2 receptor gene 5' 
flanking region and exon 1 


>HSIL45 


Human interleukin 4 gene 5 '-region 


>HSIL5 


Human interleukin 5 (IL-5) gene, 
complete cds. 


>HSIL6B 


Human interleukin 6 (IL 6) gene, 5' 
flank. 


>HSIL71 


Human interleukin 7 (IL7) gene, exon 
1. 


>HSIL9A 


Human IL9 protein gene, complete 
cds. 
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>HSINSU 


Human gene for preproinsulin, from 
chromosome 11 . Includes a highly 


>HSINT1G 


Human int-l mammary oncogene 


>HSJUNCAA 


Human jun-B gene, complete cds* 


>HSKER65A 


Human DNA for 65 kD keratin type II 
exon 1 and 5' flank 


>HSKERUHS 


Human gene for ultra high-sulphur 
keratin protein 


>HSLACTG 


Human alpha- lactalbumin gene 


>HSLAG1G 


Human LAG-1 gene 


>HSLCATG 


Human gene for lecithin-cholesterol 
acyltransf erase (LCAT) 


>HSLCK1 


Human lymphocyte- specific protein 
tyrosine kinase (lck) gene 


>HSLFACD 


Human leukocyte function-associated 
antigen-1 (LFA-1 or CDlla) 


>HSLPLA 


Human lipoprotein lipase (LPL) gene, 
5 9 flank. 


>HSLYAM01 


Human leukocyte adhesion molecule- 1 
(LAM-1) , exon 1. 


>HSLYSOZY 


Human lysozyme gene (EC 3.2*1.17) 


>HSMBP1A 


Human DNA for mannose binding pro- 
tein 1 (MBP1) , Exon 1 


>HSMCCPAA 


Human mast cell carboxypeptidase A 
(MC-CPA) gene, exons 1-2. 


>HSMDR1 


Human P-glycoprotein (MDR1) mRNA, 
complete cds. 


>HSMED 


Human bona marrow serine protease 
gene (medullasin) 


>HSMEHG 


Human DNA (exon 1) for microsomal 
epoxide hydrolase 


>HSMETIE 


Human metal lothionein-Ie gene 
(hMT-Ie) . 


>HSMG01 


Human myoglobin gene (exon 1) 


>HSMGSAG 


Human gene for melanoma growth stim- 
ulatory activity (MGSA) 
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>HSMHCAG1 


Human alpha-MHC gene for myosin 
hp aw chain N— terminus} 


n. o ixm v- \3 r* jl 


Human class II invariant cramma— chain 
gene (5' flank, exon 1) 


>HSMHCW5 


Human MHC class I HLA-Cw5 gene, 5' 
flank. 


>HSMLN1 


Human motilin gene exon 1 


>HSMPOA 


Unman -mxrol nnornY "I rtaco rrprio PVOI1S 

Human niyeioperoAxuaoc y cues , cavi*^ 
1-4. 


>HSMRP 


Human niitocnonuriai K«A"pruweb»Aiiy 
endoribonuclease RNA (mrp) gene 


>HSMTS1A 


H. sapiens mtsl gene, 5' end. 


>HSMYCE12 


Human myc-oncogene exon 1 and exon 2 


>HSNAKATP 


Human Na,K-ATPase beta subunit 
(ATP1B) gene, exons l ana ^. 


>HSNEURK1 


xi. sapiens gene ior neuroinetiJ.ii <= 
ceptor (exon 1) 


>HSNFH1 


Human gene for heavy neurofilament 
subunit (NF-H) exon 1 


>HSNFIL6 


Human gene for nuclear factor NF-IL6 


>HSNFL»G 


Uiiinari rraYiO ■£ nOHTOf 1 1 atTIPttt SUbUnit 

XiU.IUcl.jri Ucllc lUi iicui ui xxaiudtu ^ 

NF-L 


>HSNK21 


gene, exon 1. 


>HSNraxw 


Untran rtcrm 1 *i r\c» 1*1— TftVC Cf f*T"l^ 

itUIuan yeiin lxiic ii iujt* 


>HSNRASPR 


H. sapiens N-RAS promoter region 


>HS0DC1A 


Human ornithine decarboxylase (0DC1) 
gene, complete cds. 


>HS0TCEX1 


Human ornithine transcarbamylase 
(OTC) gene, 5 '-end region. 


>HS0TNPI 


Human prepro-oxytocin-neurophysin I 
gene, complete cds. 


>HSP450SCC 


Human cytochrome P450scc gene, 5' 
end and promoter region. 


>HSP53G 


Human p53 gene for transformation 
related protein p53 
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>HSPADP 


Human promoter DNA for Alzheimer's 
disease amyloid A4 precursor 


>HSPAI11 


Human gene for plasminogen activator 
inhibitor 1 (PAI-1) 5 '-flank 


>HSPGDF 


Human platelet-derived growth factor 
A-chain (PDGF) gene, 5' end 


>HSPGP95G 


Human PGP9.5 gene for neuron-speci- 
fic ubiquitin C- terminal 


>HSPLSM 


Human plasminogen gene, exon 1. 


>HSPNMTB 


Human gene for phenylethanolamine 
N-methylase (PNMT) (EC 2.1.1.28) 


>HSPOMC5F 


Human opiomelanocortin gene, 5' 
f lank. 


>HSPP14B 


Human placental protein 14 (PP14) 
gene, complete cds. 


>HSPRB3L 


Human gene PRB3L for proline-rich 
protein Gl 


>HSPRB4S 


Human PRB4 gene for proline-rich 
protein Po, allele S 


>HSPRLNC 


Human prolactin mRNA, partial cds. 


>HSPROAAl 


Human prothymosin-alpha gene, com- 
plete cds. 


>HSPROT2 


Human protamine 2 gene, complete 
cds. 


>HSPRPE1 


Human SPR2-1 gene for small proline 
rich protein (exon 1) 


>HSPS2G1 


Human estrogen-responsive gene pS2 
5' flank and exon 1 


>HSPSAP 


Human pulmonary surfactant apopro- 
tein (PSAP) gene, complete cds. 


>HSPSP94A 


Human gene for prostatic secretory 
protein PSP-94, exon 1 


>HSPTHRPA 


Human parathyroid hormone-related 
peptide (PTHRP) gene, exons 1A, 


>HSPURNPHO 


Human gene for purine nucleoside 
phosphorylase (upstream region) 


>HSRDNA 


Human rDNA origin of transcription 
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>HSREGA01 


Human regenerating protein (reg) 
gene, complete cds. 


>HSREN01 


Human renin gene 5' region and exon 
1 


>HSRPBG1 


Human gene fragment for retinol bin* 
ding protein (RBP) (exon 1-4) 


>HSSAA1A 


Human serum amyloid A (GSAA1) gene, 
complete cds. 


>HSSAA1B 


H. sapiens SAA1 beta gene 


>HSSB4B1 


Human gene fragment for HLA class II 
SB 4 -beta chain (exon 1) 


>HSSISG5 


Human c-sis proto-oncogene 5' region 


>HSSLI?G 


Human SLPI gene for secretory leuko- 
cyte protease inhibitor 


>HSS0D1G1 


Human superoxide dismutase (SOD-l) 
gene exon 1 and 5' flanking 


>HSSODB 


Human ornithine decarboxylase gene, 
complete cds. 


>HSSRDA01 


H. sapiens steroid 5-alpha-reductase 
gene, exon 1. 


>HSSUBP1G 


H. sapiens gene for substance P re- 
ceptor (exon 1) 


>HSSYB1A1 


Human synaptobrevin 1 (SYB1) gene, 
exon l. 


>HSTAT1 


Human gene for tyrosine aminotrans- 
ferase (TAT) (EC 2.6.1-5) Exon l. 


>HSTCBV81 


Human T-cell receptor V-beta 8.1 
gene 775 


>HSTCRB21 


Human T-cell receptor beta chain 
gene variable region. 


>HSTFG5 


Human transferrin (Tf) gene 5 'region 


>HSIL3FL5 


Human inter leukin 3 gene, 5 ' flank. 


>HSTFPB 


Human tissue factor gene, complete 
cds. 


>HSTGFB1 


Human mRNA for transforming growth 
factor-beta (TGF-beta) 
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>HSTGFB3B 


Human transforming growth factor 
beta -3 gene, 5' end. 


>HSTGFBET2 


Human transforming growth factor 
beta-2 gene, 5' end. 


>HSTH01 


Human tyrosine hydroxylase (TH) (EC 
1.14. 16 .2) gene from upstream 


>HSTHI02A 


Human metallothionein gene IIA pro- 
moter region 


>HSTHRO01 


Human thrombospondin gene, exons 1, 
2 and 3. 


>HSTHXBG 


H. sapiens gene for thyroxine-binding 

a 1 nhu 1 in ctptip 


>HSTHYR5 


Human thyroglobulin gene 5' region 


>HSTNFA 


Human gene for tumor necrosis factor 
(TNF-alpha) 


>HSTNFB 


Human gene for lymphotoxin (TNF-be- 
ta) 


>H STOP 01 


Homo saoiens tvne I DNA tooo isomer— 
ase gene, exons 1 and 2. 




Humeri n*?PTiho Alalia isoTfteiraSG 

(TPI) gene, 5' end* 


>HSTP05 


Human thyroid peroxidase gene 5 'end 
(EC 1.11.1.7) 


>HSTRP 


Human transferrin receptor gene pro- 
moter region 


>HSTRPY1B 


Human tryptase-I gene, complete cds* 


>HSTUBB2 


Human beta 2 gene for beta-tubulin 


>HSTYR01E 


Human tyrosinase gene, exon 1 and 5' 
flanking region (EC 1.14.18.1) 


>HSU6RNA 


Human gene for U 6 RNA 


>HSUPA 


Human uPA gene for urokinase-plas- 
minogen activator 


>HSVAVP01 


Human proto-oncogene vav, 5' end. 


>HSVCAM1A 


Human vascular cell adhesion mole- 
cule-1 (VCAM1) gene, complete CDS. 
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>HSVIM5RR 


Human vimentin gene 5' regulatory 




region 



Once the gene target or, in the case of small 
pathogens, the genome target has been identified, short 
5 sequences within the gene or genome target are identi- 
fied as medically significant target sites. Medically 
significant target sites can be defined as short DNA 
sequences (approximately 4-30 base pairs) that are 
required for the expression or replication of genetic 
10 material. For example, sequences that bind regulatory 
factors, either transcriptional or replicatory factors, 
are ideal target sites for altering gene or viral 
expression. 

Further, coding sequences may be adequate target 
15 sites for disrupting gene function, although the 
disruption of a polymerase complex that is moving along 
the DNA sequence may require a stronger binder than for 
the disruption of the initial binding of a regulatory 
protein. 

20 Finally, even non-coding, non-regulatory sequences 

may be of interest as target sites (e.g., for disrupt- 
ing replication processes or introducing an increased 
mutational frequency) . 

Several specific examples of medically significant 

25 target sites are shown in Table IV. 



Table IV 



MEDICALLY SIGNIFICANT DNA— BINDING SEQUENCES 



Test sequence 


DNA-binding Protein 


Medical; Significance 


EBV origin of 
replication 


EBNA 


infectious mononu- 
cleosis, nasal pha- 
ryngeal carcinoma 


HSV origin of 
replication 


UL9 


oral and genital 
Herpes 
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Test sequence 


DNA-binding Protein 


Medical Significance : | 




VZV origin of 
replication 


UL9-like 


shingles 




HPV origin of 
replication 


E2 


genital warts, cer- 
vical carcinoma 


5 


Interleukin 2 
enhancer 


NFAT-1 


immunosuppressant 




HIV LTR 


NFAT-1 
NFkB 


AIDS, ARC 




HBV enhancer 


HNF-1 


hepatitis 


10 


Fibrogen pro- 
moter 


HNF-1 


cardiovascular dis- 
ease 




Oncogene pro- 
moter and 
coding se- 
quences 


?? 


cancer 



15 



(Abbreviations: EBV, Epstein-Barr virus; EBNA, 
Epstein-Barr virus nuclear antigen; HSV, Herpes 
Simplex virus; VZV, Varicella zoster virus; HPV, 
human papilloma virus; HIV LTR, Human immunodeficiency 
20 virus long terminal repeat; NFAT, nuclear factor of 
activated T cells; NFkB, nuclear factor kappaB; AIDS 
acquired immune deficiency syndrome; ARC, AIDS related 
complex; HBV, hepatitis B virus; HNF, hepatic nuclear 
factor. ) 

25 For example, origin of replication binding proteins 
have short, well-defined binding sites within the viral 
genome and are therefore excellent target sites for a 
competitive DNA-binding drug. Examples of such 
proteins include, Epstein Barr virus nuclear antigen 1 

30 (EBNA-1) (Ambinder, et al.; Reisman, et al.), E2 (which 
is encoded by the human papilloma virus) (Chin, et 
al.) , UL9 (which is encoded by herpes simplex virus 
type 1) (McGeoch, et al.), and the homologous protein 
in varicella zoster virus (VZV) (Stow, et a!.)» 

35 Similarly, recognition sequences for DNA-binding 

proteins that act as transcriptional regulatory factors 
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are also good target sites for antiviral DNA-binding 
drugs. Examples listed in Table IV include (i) the 
binding site for hepatic nuclear factor (HNF-1) , which 
is required for the expression of human hepatitis B 
5 virus (HBV) (Chang), and (ii) NFkB and NFAT-1 binding 
sites in the human immunodeficiency virus (HIV) long 
terminal repeat (LTR) , one or both of which may be 
involved in the expression of the virus (Greene, W.C.) . 
Examples of non-viral DNA targets for DNA-binding 

10 drugs are also shown in Table IV to illustrate the wide 
range of potential applications for sequence-specific 
DNA-binding molecules* For example, nuclear factor of 
activated T cells (NFAT-1) is a regulatory factor that 
is crucial to the inducible expression of the interleu- 

15 kin 2 gene in response to signals from the antigen 
receptor, which, in turn, is required for the cascade 
of molecular events during T cell activation (for 
review, see Edwards, C.A., and Crabtree, G.R.) . The 
mechanism of action of two immunosuppressants, cyclo- 

20 sporin A and FK506, is thought to be to block the 
inducible expression of NFAT-1 (Schmidt, et al . and 
Banerji, et al.)- However, the effects of these drugs 
are not specific tc NFAT-1; therefore, a drug targeted 
specifically to the NFAT-1 binding site in the IL-2 

25 enhancer would be desirable as an improved immunosup- 
pressant. 

Targeting the DNA site with a DNA-binding drug 
rather than targeting with a drug that affects the DNA- 
binding protein (presumably the target of the current 

30 immunosuppressants) is advantageous for at least two 
reasons: first, there are many fewer target sites for 
specific DNA sequences than specific proteins (e.g. , in 
the case of glucocorticoid receptor, a handful of DNA- 
binding sites vs. about 50,000 protein molecules in 

35 each cell); and second, only the targeted gene need be 
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affected by a DNA-binding drug, while a protein-binding 
drug would disable all the cellular functions of the 
protein. An example of the latter point is the binding 
site for HNF-1 in the human fibrinogen promoter. 
5 Fibrinogen level is one of the most highly correlated 
factor with cardiovascular disease. A drug targeted to 
either HNF-1 or the HNF-1 binding site in the fibrino- 
gen promoter might be used to decrease fibrinogen 
expression in individuals at high risk for disease 
10 because of the over-expression of fibrinogen. However, 
since HNF-1 is required for the expression of a number 
of normal hepatic genes, blocking the HNF-1 protein 
would be toxic to liver function. In contrast, by 
blocking a DNA sequence that is composed in part of the 
15 HNF-1 binding site and in part by flanking sequences 
for divergence, the fibrinogen gene can be targeted 
with a high level of selectivity, without harm to 
normal cellular HNF-1 functions. 

The assay has been designed to screen virtually 
20 any DNA sequence. Test sequences of medical signifi- 
cance include viral or microbial pathogen genomic se- 
quences and sequences within or regulating the expres- 
sion of oncogenes or other inappropriately expressed 
cellular genes. In addition to the detection of 
25 potential antiviral drugs, the assay of the present 
invention is also applicable to the detection of 
potential drugs for (i) disrupting the metabolism of 
other infectious agents, (ii) blocking or reducing the 
transcription of inappropriately expressed cellular 
30 genes (such as oncogenes or genes associated with 
certain genetic disorders) , and (iii) the enhancement 
or alteration of expression of certain cellular genes. 
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2 . Defined Sets of Test Sequences , 
The approach described in the above section 
emphasizes screening large numbers of fermentation 
broths, extracts, or other mixtures of unknowns against 
5 specific medically significant DNA target sequences. 
The assay can also be utilized to screen a large number 
of DNA sequences against known DNA-binding drugs to 
determine the relative affinity of the single drug for 
every possible defined specific sequence. For example, 
10 there are 4 B possible sequences, where n = the number 
of nucleotides in the sequence . Thus, there are 4 3 = 
64 different three base pair sequences, 4 4 = 256 
different four base pair sequences, 4 5 = 1024 different 
5 base pair sequences, etc. If these sequences are 
15 placed in the test site, the site adjacent to the 
screening sequence (the example used in this invention 
is the UL9 binding site) , then each of the different 
test sequences can be screened against many different 
DNA-binding molecules. 
20 The test sequences may be placed on either or both 

sides of the screening sequence, and the sequences 
flanking the other side of the test sequences are fixed 
sequences to stabilize the duplex and, on the 3' end of 
the top strand, to act as an annealing site for the 
25 primer (see Example 1) . In Figure 14B, the TEST and 
SCREENING sequences are indicated. The preparation of 
such double-stranded oligonucleotides is described in 
Example 1 and illustrated in Figure 4. 

The test sequences, denoted in Figure 14B as X:Y 
30 (where X = A,C,G, or T and Y = the complementary se- 
quence, T,G,C, or A), may be any of the 256 different 
4 base pair sequences shown in Figure 13. 

Once a set of test oligonucleotides containing all 
possible four base pair sequences has been synthesized 
35 (see Example 1) , the set can be screened with any DNA- 
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binding drug. The relative effect of the drug on each 
oligonucleotide assay system will reflect the relative 
affinity of the drug for the test sequence. The entire 
spectrum of affinities for each particular DNA sequence 
5 can therefore be defined for any particular DNA-binding 
drug. This data, generated using the assay of the 
present invention, can be used to facilitate molecular 
modeling programs and/or be used directly to design new 
DNA-binding molecules with increased affinity and 

10 specificity. 

Another type of ordered set of oligonucleotides 
that may be useful for screening are sets comprised of 
scrambled sequences with fixed base composition. For 
example, if the recognition sequence for a protein is 

15 5'-GATC~3' and libraries were to be screened for DNA- 
binding molecules that recognized this sequence, then 
it would be desirable to screen sequences of similar 
size and base composition as control sequences for the 
assay. The most precise experiment is one in which all 

20 possible 4 bp sequences are screened. In the case of 
a 4 base-pair sequence , this represents 4 4 = 256 
different test sequences: a number of screening se- 
quences that may not be practical in every situation. 
However, there are many fewer possible 4 bp sequences 

25 with the same base composition (1G, 1A, IT, 1C) (nl 

= 24 different 4 bp sequences with this particular 
base composition) , such sequences provide excellent 
controls without having to screen large numbers of se- 
quences. 

30 

3 . Theoretical Considerations in Choosing 
Biological Target Sites: Specificity 
and Toxicity . 

One consideration in choosing sequences to 



35 



screen using the assay of the present invention is test 
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sequence accessibility , that is, the potential exposure 
of the sequence in vivo to binding molecules. Cellular 
DNA is packaged in chromatin, rendering most sequences 
relatively inaccessible. Sequences that are actively 
5 transcribed, particularly those sequences, that are 
regulatory in nature, are less protected and more 
accessible to both proteins and small molecules. This 
observation is substantiated by a large literature on 
DNAase I sensitivity, footprinting studies with 
10 nucleases and small molecules, and general studies on 
chromatin structure (Tullius) . The relative accessi- 
bility of a regulatory sequence, as determined by 
DNAase I hypersensitivity, is likely to be several 
orders of magnitude greater than an inactive portion of 
15 the cellular genome. For this reason the regulatory 
sequences of cellular genes, as well as viral regulato- 
ry or replication sequences, are useful regions to 
choose for selecting specific inhibitory small mole- 
cules using the assay of the present invention. 
20 Another consideration in choosing sequences to be 

screened using the assay of the present invention is 
the uniqueness of the potential test sequence • As 
discussed above for the nuclear protein HNF-1, it is 
desirable that small inhibitory molecules are specific 
25 to their target with minimal cross reactivity* Both 
sequence composition and length effect sequence 
uniqueness. Further, certain sequences are found less 
frequently in the human genome than in the genomes of 
other organisms, for example, mammalian viruses. 
30 Because of base composition and codon utilization 
differences, viral sequences are distinctly different 
from mammalian sequences. As one example, the dinucle- 
otide CG is found much less frequently in mammalian 
cells than the dinucleotide sequence GC: further, in 
35 SV40, a mammalian virus, the sequences AGCT and ACGT 
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are represented 34 and 0 times, respectively. Specific 
viral regulatory sequences can be chosen as test se- 
quences keeping this bias in mind. Small inhibitory 
molecules identified which bind to such test sequences 
5 will be less likely to interfere with cellular func- 
tions. 

There are approximately 3 x io 9 base pairs (bp) in 
the human genome. Of the known DNA-binding drugs for 
which there is crystallographic data, most bind 2-5 bp 

10 sequences. There are 4 4 = 256 different 4 base se- 
quences; therefore, on average, a single 4 bp site is 
found roughly 1.2 x io 7 times in the human genome. An 
individual 8 base site would be found, on average, 
about 50,000 times in the genome. On the surface, it 

15 might appear that drugs targeted at even an 8 bp site 
might be deleterious to the cell because there are so 
many binding sites; however, several other consider- 
ations must be recognized. 

First, most DNA is tightly wrapped in chromosomal 

20 proteins and is relatively inaccessible to incoming 
DNA-binding molecules as demonstrated by the nonspecif- 
ic endonucleolytic digestion of chromatin in the 
nucleus (Edwards, C.A. and Firtel, R.A.). Active 
transcription units are more accessible, but the most 

25 highly exposed regions of DNA in chromatin are the 
sites that bind regulatory factors. As demonstrated by 
DNAase I hypersensitivity (Gross, D.S. and Garrard, 
W.T.), regulatory sites may be 100-1000 times more 
sensitive to endonucleolytic attack than the bulk of 

30 chromatin. This is one reason for targeting regulatory 
sequences with DNA-binding drugs. 

Secondly, several anticancer drugs that bind 2, 3, 
or 4 bp sequences have sufficiently low toxicity so 
that they can be used as drugs. This indicates that, 
35 if high affinity binding sites for known drugs can be 
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matched with specific viral target sequences, it may be 
possible to use currently available drugs as antiviral 
agents at lower concentrations than they are currently 
used, with a concomitantly lower toxicity. 

5 

4 . Further Considerations in Choosing 
Target Sites: Finding Eukarvotic Pro- 
moters . 

Eukaryotic organisms have three RNA polyraer- 

10 ases (Pol I, II, and III) that transcribe genetic 
information from DNA to RNA. The correct regulation of 
this information flow is essential for the survival of 
the cell. These multi-subunit enzymes need additional 
proteins to regulate transcription- Many of these 

15 additional proteins bind to DNA in a region 5' of the 
translation start site for a gene: this region is 
generally known as the promoter region of the gene. 

All three polymerases use a core set of general 
transcription proteins to bind to this region. A 

20 central component of this complex is the protein called 
TBP or TFIID. The site this protein binds to is known 
as the TATA-box because the sequence usually contains 
a sequence motif similar to TATA {e.g., TATAa/tAa/t) . 
Originally it was thought that each of the three 

25 polymerases used a separate set of general transcrip- 
tion factors and that Pol II used TFIID exclusively. 
Recently it has been shown that all three classes of 
RNA polymerase need TFIID for transcriptional regula- 
tion (see Comai, et al . ; and Greenblatt) 

30 A molecule that binds to a DNA sequence closely 

adjacent or overlapping a TATA binding site will likely 
alter transcriptional regulation of the gene. If the 
molecule binds based solely on specificity to the TATA- 
box sequence itself, then this molecule is expected to 

35 be very toxic to cells since the transcription of most 
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genes would be altered. The sequences adjacent to TATA 
boxes, however, are not conserved. Accordingly, if a 
particular sequence is selected adjacent a TATA box of 
a particular gene, a molecule that binds to this 
5 specific sequence would be expected to alter the 
transcriptional regulation of just that gene. 

TATA-boxes were first identified by determining 
the sequence of the DNA located 5' of the RNA start 
sites of a number of genes. Examination of these se- 
10 quences revealed that most genes had a TATA-box motif 
(consensus sequence) in the range of nucleotides 50 to 
15 nucleotides 5' of the RNA start site. In vitro 
studies, typically DNA protection (f ootprinting) 
studies, lead to the conclusion that proteins were 
15 binding to these sites. Further in vitro DNA binding 
experiments demonstrated that some proteins could 
specifically bind to these sites* This lead to assays 
that allowed purification and subsequent sequencing of 
the binding proteins. This information facilitated the 
20 cloning and expression of genes encoding the binding 
proteins. A large number of transcription factors are 
now known. The protein designated TFITD has been 
demonstrated to bind to the TATA-box (Lee, et al.). 

Molecules that interfere with the interaction of 
25 these transcription factors and their target DNA (i.e. , 
DNA/Protein transcription complexes) are also expected 
to alter transcription initiated from the target DNA. 
A publicly available database of these factors and the 
sequences to which they bind is available from the 
30 National Library of Medicine and is called "The 
Transcription Data Base, or TFD." The binding sites of 
these transcription factors can be identified in the 5' 
non-coding region of genes having known sequences 
(Example 15) . 
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10 



15 



20 



25 



30 



35 



The ability to select target sequences adjacent 
the binding site of a transcription factor, as de- 
scribed above for TFIID, can be applied to other 
general transcription factors as well. For the purpose 
of the present invention, a general transcription 
factor is one that regulates the transcriptional 
expression of more than one gene. For any such general 
transcription factor, as for TFIID above, a particular 
target sequence can be selected adjacent the transcrip- 
tion factor binding site of a selected gene, a mole- 
cule that binds to this specific target sequence would 
be expected to alter the transcriptional regulation of 
Dust tnat gene and not all of the genes for which the 
transcription factor regulates expression. Alteration 
of transcriptional regulation may involve inhibition or 
increased affinity (enhancement) of binding of a 
transcription factor to its cognate DNA. 

Many examples of such general transcription 
factors have been identified, including, but not 
limited to, the following: SPl (Raney, et al., 1992? 
Kitadai, et al. t 1992); NFAT-1 (Shaw, et al., 1988); 
Ets family of transcription factors, including Elfl 
(Thompson, et al . , 1992); Fos protein (Neuberg, et al. 

1991) ; NF-kappa (Wirth, et al., 1988; Meijer, et al . ,' 

1992) ; and API-like proteins, including the product of 
the c-jun oncogene (Descheemaeker , et al., 1992; Ryder 
et al., 1988; Harshman et al., 1988; Angel et al. , 
1988; Bos et al., i 98 8; Bohmann et al. , 1987). 

Accordingly, for a selected gene, non-conserved 
DNA surrounding the transcription factor binding site 
can be chosen as a specific target sequence for small 
molecule binding, a small molecule can be chosen whose 
binding overlaps an adjacent transcription factor DNA 
binding sequence (e.g., by 1-3 nucleotide pairs). m 
this case, the specificity of DNA binding for the small 
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molecule is, in large part, derived from the non- 
conserved sequences adjacent the transcription factor 
binding site, in order to reduce small molecule binding 
at the transcription factor binding site associated 
5 with other genes. 

Small molecules that bind such specific target se- 
quences can be identified and/or designed using the 
assay and methods of the present invention. 

10 5. Further Considerations in Choosing 

Alternative Small -Molecule-Binding 
Target Sites. 

Small molecules that interfere with the interac- 
tion of any DNA binding protein and its cognate DNA 

15 (i.e., DNA/Protein complexes) can be selected by the 
assay and methods of the present invention. As de- 
scribed above for general transcription factors, 
sequences adjacent the DNA binding site for a selected 
DNA binding protein can serve as a target for small 

20 molecule binding in order to alter the interaction of 
the DNA binding protein and its cognate site. The 
small molecule can affect the DNA: protein interaction, 
for example, by inhibiting or enhancing the association 
of protein with the DNA. 

25 For a selected DNA: protein interaction, non- 

conserved DNA surrounding the selected DNA binding site 
can be chosen as a specific target sequence for small 
molecule binding. In some cases the small molecule 
binding can overlap the DNA binding site: for example, 

30 in the case of a therapeutic used to treat a mammal 
with a bacterial infection, a small molecule may be 
selected to bind to the bacterial origin of DNA 
replication. Such a small molecule may essentially 
completely overlap the region defined by the bacterial 

35 origin-of-replication-DNA: protein interaction since a 
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corresponding target sequence is not likely present in 
the DNA of the mammalian host. 

However, in the case where selective binding is 
required, as described above for TFIID, the specificity 
5 of the small molecule for DNA binding should essential- 
ly derive from the non-conserved sequences adjacent the 
DNA-binding protein's cognate DNA-binding site. This 
results in small molecule binding being reduced at 
similar DNA: protein binding sites at other locations. 

10 

6 . Further Considerations in Choosing 
Target Sites: Procarvotes and Viruses . 

Bacterial gene expression is regulated at 

several different levels, including transcription. 

15 General and specific transcription factors are needed 
along with the core RNA polymerase to accurately 
produce appropriate amounts of mRNA. Antibiotics that 
bind to the RNA polymerase and prevent mRNA production 
are potent bacterial poisons: molecules that could 

20 interfere with the initiation of transcription for 
specific essential genes are expected to have similar 
effects. 

Many bacterial promoters have been sequenced and 
carefully examined. In general, the majority of 

25 bacterial promoters have two well characterized 
regions, the -35 region which has a consensus sequence 
similar to SEQ ID NO: 625 and the -10 region with a 
consensus sequence of SEQ ID NO: 626. The sequence of 
the start site for RNA polymerase, however, is not 

3 0 always the same. The start site is determined by a 
supplementary protein called the sigma factor, which 
confers specificity for binding the RNA polymerase 
core. Several sigma factors are present in any species 
of bacteria. Each sigma factor recognizes a different 

35 set of promoter sequences. Expression of sigma factors 
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is regulated, typically, by the growth conditions the 
bacteria is encountering. These sigma factor promoter 
sequences represent excellent targets for sequence 
specific DNA binding molecules. 
5 As an example of choosing target sequences for the 

purpose of designing a DNA-binding therapeutic for a 
bacterial disease, consider the example of tuberculo- 
sis. Tuberculosis is caused by Mycobacterium tubercu~ 
losis. 

10 All bacteria need to make ribosomes for the 

purpose of protein synthesis. The -35 and -10 regions 
of M. tuberculosis ribosome RNA synthesis has been 
determined. In the EMBL locus MTRRNOP the -35 signal 
is located at coordinants 394.. 400 and the -10 signal 

15 is found at coordinants 419.. 422. These regions 
represent excellent targets for a DNA binding drug that 
would inhibit the growth of the bacteria by disrupting 
its ability to make ribosomes and synthesize protein. 
Multiple other essential genes could be targeted in a 

20 similar manner. 

M. tuberculosis is a serious public health problem 
for several reasons , including the development of 
antibiotic resistant strains. Many antibiotics inhibit 
the growth of bacteria by binding to a specific protein 

25 and inhibiting its function. An example of this is the 
binding of rifampicin to the beta subunit of the 
bacterial RNA polymerase. Continued selection of 
bacteria with an agent of this kind can lead to the 
selection of mutants having an altered RNA polymerase 

30 so that the antibiotic can no longer bind it. Such 
mutants can arise from a single mutation. 

However, binding a drug to a DNA regulatory region 
requires at least two mutations to escape the inhibito- 
ry effect of the drug: one mutation in the target DNA 

35 sequence so that the drug could not bind the target se- 
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quence, and one mutation in the regulatory binding 
protein so that it can recognize the new, mutated 
regulatory sequence. Such a double mutation event is 
much less frequent than the single mutation discussed 
5 above, for example, with rifaropicin. Accordingly, it 
is expected that the development of drug resistant 
bacteria would be much less common for DNA-binding 
drugs that bind to promoter sequences. 

The HIV viral promoter region (shown in Figure 28) 

10 provides an example of choosing DNA target sequences 
for sequence-specific DNA binding drugs to inhibit 
viral replication. 

Many eukaryotic viruses use promoter regions that 
have similar features to normal cellular genes. The 

15 replication of these viruses depends on the general 
transcription factors present in the host cell. As 
such, the promoter sequences in DNA viruses are similar 
to those found in cellular genes and have been well- 
studied. The binding factors Sp-1 and TFIID are 

20 important generalized factors that most viral promoters 
use. 

In the HIV promoter sequence found in LOCUS 
HIVBH101 in version 32 of the EMBL databank, three 
tandem decanucleotide Spl binding sites are located 

25 between positions 377 and 409. Site III shows the 
strongest affinity for the cellular factor. The three 
cause up to a tenfold effect on transcriptional 
efficiency in vitro. The transcription start site is 
at position 455, with a TATA box at 427-431 in the se- 

30 quence listed below. In addition to these sites, there 
are two NF-kappa-B sites in this region between 
nucleotides 350 and 373. These sites are annotated in 
Figure 28. 

Sequence-specific DNA binding molecules that 
35 specifically disrupted this binding would be expected 
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to disrupt HIV replication. For example, the sequences 
adjacent to the TFIID binding site (SEQ ID NO: 628 
and/or SEQ ID NO: 629) , would be target sites for a DNA- 
binding molecule designed to disrupt TFIID binding. 
These sequences are found in HIV but are not likely to 
occur overlapping TFIID binding sites in the endogenous 
human genome. Multiple sites could be targeted to 
decrease the likelihood that a single mutation could 
prevent drug binding. 

D * "sing Test Mat rices and Patten M^china for- 
the Analysis of nata * - 

The assay described herein has been designed to 
use a single DHA: protein interaction to screen for se- 
quence-specific and sequence-preferential DNA-binding 
molecules that can recognize virtually any specified 
sequence. By using sequences flanking the recognition 
site for a single DNA:protein interaction, a very large 
number of different sequences can be tested. The 
analysis of data yielded by such experiments displayed 
as matrices and analyzed by pattern matching techniques 
should yield information about the relatedness of DNA 
sequences. 

The basic principle behind the DNA: protein assay 
of the present invention is that when molecules bind 
DNA sequences flanking the recognition sequence for a 
specific protein the binding of that protein is 
blocked. Interference with protein binding likely 
occurs by either (or both) of two mechanisms: (i) 
directly by stearic hindrance, or (ii) indirectly by 
perturbations transmitted to the recognition sequence 
through the DNA molecule. 

Both of these mechanisms will presumably exhibit 
distance effects. For inhibition by direct stearic 
hindrance direct data for very small molecules is 
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available from methylation and ethylation interference 
studies. These data suggest that for methyl and ethyl 
moieties, the stearic effect is limited by distance 
effects to 4-5 base pairs. Even still the number of 
5 different sequences that can theoretically be tested 
for these very small molecules is still very large 
(i.e., 5 base pair combinations total 4 5 (=1024) 
different sequences) . 

In practice, the size of sequences tested can be 

10 explored empirically for different sized test DNA- 
binding molecules. A wide array of sequences with 
increasing sequence complexity can be routinely 
investigated. This may be accomplished efficiently by 
synthesizing degenerate oligonucleotides and multiplex- 

15 ing oligonucleotides in the assay process (i.e., using 
a group of different oligonucleotides in a single 
assay) or by employing pooled sequences in test 
matrices. 

In view of the above, assays employing a specific 
20 protein and oligonucleotides containing the specific 
recognition site for that protein flanked by different 
sequences on either side of the recognition site can be 
used to simultaneously screen for many different mole- 
cules, including small molecules, that have binding 
25 preferences for individual sequences or families of 
related sequences. Figure 12 demonstrates how the 
analysis of a test matrix yields information about the 
nature of competitor sequence specificity. As an 
example, to screen for molecules that could preferen- 
30 tially recognize each of the 256 possible tetranucleo- 
tide sequences (Figure 13), oligonucleotides could be 
constructed that contain these 256 sequences immediate- 
ly adjacent to a 11 bp recognition sequence of UL9 oris 
SEQ ID NO: 615, which is identical in each construct. 
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In Figure 12 ,, + M indicates that the mixture 
retards or blocks the formation of DNA:protein complex- 
es in solution and "~ n indicates that the mixture had 
no marked effect on DNA: protein interactions* The 
5 results of this test are shown in Table V. 



Table V 



Test Mix 


Specificity 


#1,4,7: oligos 


none detected for the above 


#2: for recognition site 


either nonspecific or specific 


#3 


AGCT 


#5 


CATT or ATT 


#6 


GCATTC, GCATT, CATTC, GCAT, or 
ATTC 


#8 


CTTT 



These results demonstrate how such a matrix 
provides data on the presence of sequence specific 
binding activity is a test mixture and also provides 
inherent controls for non-specific binding. For 

20 example, the effect of test mix #8 on the different 
test assays reveals that the test mix preferentially 
affects the oligonucleotides that contain the sequence 
CCCT. Note that the sequence does not have to be 
within the test site for test mix #8 to exert an 

25 affect. By displaying the data in a matrix, the 
analysis of the sequences affected by the different 
test mixtures is facilitated. 

Furthermore, defined, ordered sets of oligonucleo- 
tides can be screened with a chosen DNA-binding mole- 

3 0 cule. The results of these binding assays can then be 
examined using pattern matching techniques to determine 
the subsets of sequences that bind the molecule with 
similar binding characteristics . If the structural and 
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biophysical properties (such as, geometric shape and 
electrostatic properties) of sequences are similar, 
then it is likely that they will bind the molecule with 
similar binding characteristics* If the structural and 
5 biophysical properties of sequences are different, then 
it is likely that they will not bind the molecule with 
similar binding characteristics. In this context, the 
assay might be used to group defined, ordered sequences 
into subsets based on their binding characteristics: 

10 for example, the subsets could be defined as high 
affinity binding sites, moderate affinity binding 
sites, and low affinity binding sites. Sequences in 
the subsets with positive attributes (e.g., high 
affinity binding) have a high probability of having 

15 similar structural and biophysical properties to one 
another . 

By screening and analyzing the binding character- 
istics of a number of DNA-binding molecules against the 
same defined set of DNA sequences, data can be accumu- 

20 lated about the subsets of sequences that fall into the 
same or similar subsets* Using this pattern matching 
approach, which can be computer-assisted, the sequences 
with similar structural and biophysical properties can 
by grouped empirically. 

25 The database arising from pattern matching 

analysis of raw assay data will lead to the increased 
understanding of sequence structure and thereby lead to 
the design of novel DNA-binding molecules with related 
but different binding activities. 

30 

£• Applications for the Determination of the 
Sequence Specificity of DNA-Bindincy Drugs . 

Applications for the determination of the sequence 

specificity of DNA-binding drugs are described below. 

3 5 The applications are divided into drug homo- and 
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heteromeric polymers (part 1} and sequence-specific 
DNA-binding molecules as facilitators of triple strand 
formation (part 2). 

One utility of the assay of the invention is the 
5 identification of highest affinity binding sites among 
all possible sites of a certain length for a given DNA- 
binding molecule . This information may be valuable to 
the design of new DNA-binding therapeutics. 

10 l. Multimerization of Sequence-Preferen- 

tial or Sequence-Specific DNA-Bindina 
Molecules Identified in the Assay . 

Any particular DNA-binding small molecule 
screened in the assay may only recognize a 2-4 base 
15 pair site, and even if the recognition is quite 
specific, the molecule may be toxic because there are 
so many target sites in the genome (3 x io 9 /4 4 4 bp 
sites, for example) . However, if drugs with differen- 
tial affinity for different sites are identified, the 

20 toxicity of DNA-binding drugs may be drastically 
reduced by creating dimers, trimers, or multimers with 
these drugs (Example 13). From theoretical consider- 
ations of the free energy changes accompanying the 
binding of drugs to DNA, the intrinsic binding constant 

25 of a dimer should be the square of the binding constant 
of the monomer (Le Pecq, J.B.). Experimental data 
confirmed this expectation in 1978 with dimer analogs 
of ethidium bromide (Kuhlmann, et a!.). Dimerization 
of several intercalating molecules, in fact, yields 

30 compounds with DNA affinities raised from 10 5 M* 1 for 
the corresponding monomer to 10* to 10 9 M" 1 for the 
dimers (Skorobogaty , et al . ; Gaugain, et al . (1978a and 
b) ; Le Pecq, et al ♦ ; Pelaprat, et al.). Trimeri- 
zation, which theoretically should yield binding 

35 affinities that are the cube of the affinity of the 
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homomonomeric subunit or the product of affinities of 
the heteromonomeric subunits, has yielded compounds 
with affinities as high as 10 12 Jf l (Laugaa, et al . ) , 
Such affinity is markedly better than the affinities 
5 seen for many DNA regulatory proteins. 

As a hypothetical example, if a relatively weak 
DNA-binding drug, drug X, which binds a 4 bp site with 
an affinity of 2 x 10 5 M" 1 was dimerized, the bis-X drug 
would now recognize an 8 bp site with a theoretical 

10 affinity of 4 x 10 10 M" 1 . The difference in affinity 
between the monomer X and the bis-X form is 200,000- 
fold. The number of 4 bp sites in the genome is 
approximately 1.2 x io 7 versus the number of 8 bp sites 
in the genome which is approximately 5 x IO 4 . Accord- 

15 ingly, there are 256-fold fewer 8 bp sites than 4 bp 
sites. Thus, the number of high affinity target sites 
is 256-fold fewer for the bis-X molecule than the 
number of low affinity target sites for the monomer X, 
with a 200,000-fold difference in affinity between the 

20 two types of sites. 

Since the binding constant of a dimer is the 
product of the binding constants of the monomers, when 
monomers with higher initial binding constants are 
formed into dimers (or multimers) the differential 

25 effect is proportionately increased, creating a wider 
"window" of affinity versus the number of binding 
sites. The breadth of the window essentially reflects 
the margin of effective drug concentration compared to 
the relative toxicity. 

30 There are two immediate ramifications of dimeriza- 

tion (or multimerization) of monomer ic drugs with 
moderate toxicity and sequence preference. First, the 
concentration of drug needed is lowered because of the 
higher affinity, so that even relatively toxic mole- 

35 cules can be used as drugs. Second, since toxicity is 
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likely linked to the average number of drug molecules 
bound to the genome , as specificity is increased by 
increasing the length of the binding site, toxicity is 
decreased . 

5 Given the information already available on se- 

quence-preferential binding of DNA-binding drugs, it is 
likely that each drug presented to the screening assay 
will have (i) a number of high affinity binding sites 
(e.g., 10 to 100-fold better affinity than the average 

10 site) , (ii) a larger number of sites that are bound 
with moderate affinity (3 to 10-fold better affinity 
than average) , (iii) the bulk of the binding sites 
having average affinity, and (iv) a number of sites 
having worse-than-average affinity. This range of 

15 binding affinities will likely resemble a bell-shaped 
curve. The shape of the curve will probably vary for 
each drug. To exemplify, assume that approximately 
five 4 bp sites will be high affinity binding sites, 
and twenty 4 bp sites will be moderately high affinity 

20 binding sites, then any given drug may recognize 
roughly 25, high or moderately high affinity binding 
sites. If 50 to 100 drugs are screened, this repre- 
sents a "bank" of potentially 250-500 high affinity 
sites and 1000-2500 moderately high affinity sites. 

25 Thus, the probability of finding a number of high 
affinity drug binding sites that match medically 
significant target sites is good. Furthermore, 
heterodimeric drugs can be designed to match DNA target 
sites of 8 or more bp, lending specificity to the 

30 potential pharmaceuticals. 

As discussed above, once the sequence preferences 
are known, the information may be used to design 
oligomeric molecules (homopolymers or heteropolymers) 
with substantially greater sequence specificity and 

35 substantially higher binding affinity. For example, if 
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a DNA-binding molecule, X, binds a 4 bp sequence 5'- 
ACGT-3'/5'-ACGT-3' with an equilibrium affinity 
constant of 2 x 10 5 M" 1 , then the dimer of X, X^ should 
bind the dimer of the sequence, 5'-ACGTACGT~3 ' /5'- 
5 ACGTACGT-3 ' , with an equilibrium affinity constant of 
(2 x io 5 M" 1 ) 2 = 4 x io 10 M" 2 . The DNA-binding dimer 
molecule, Xj, recognizes an 8 bp sequence, conferring 
higher sequence specificity, with a binding affinity 
that is theoretically 200,000-fold higher than the DNA- 

10 binding monomer, X. 

The same argument can be extended to trimer mole- 
cules: the trimer of X, X 3 , would bind a 12 bp se- 
quence, 5 , -ACGTACGTACGT-3 / /5'-ACGTACGTACGT-3' , with a 
theoretical equilibrium affinity constant of 8 x lo 15 M" 

15 \ 

DNA-binding polymers constructed using the above- 
mentioned approach may be homo- or hetero-polymers of 
the parent compounds or oligomeric compounds composed 
of mixed subunits of the parent compounds. Homopoly- 

20 mers are molecules constructed using two or more 
subunits of the same monomeric DNA-binding molecule, 
Heteropolymers are molecules constructed using two or 
more subunits of different monomeric DNA-binding mole- 
cules. Oligomeric compounds are constructed of mixed 

25 pieces of parent compounds and may be hetero- or 
homomeric. 

For example, distamycin is a member of a family of 
non-intercalating minor groove DNA-binding oligopep- 
tides that are composed of repeating units of N- 

30 methylpyrrole groups. Distamycin has 3 N-methylpyrrole 
groups. Examples of homopolymers would be bis- 
distamycin, the dimer of distamycin, a molecule 
containing 6 N-methylpyrrole groups or tris-distamycin, 
the trimer of distamycin, a molecule containing 9 N- 

35 methylpyrrole groups. 
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Daunomycin is a member of an entirely different 
class of DNA-binding molecules, the anthracycline 
antibiotics, that bind to DNA via intercalation. 
Heteropolymers are molecules composed of different 
5 types of DNA-binding subunits; for example, compounds 
composed of a distamycin molecule linked to a daunomy- 
cin molecule or a distamycin molecule linked to two 
daunomycin molecules. The term "oligomeric" is being 
used to describe molecules comprised of linked subunits 
10 each of which may be smaller than the parent compound* 
An example of an homo-oligomeric compound would be 
a distamycin molecule linked to 1 or 2 additional N- 
methylpyrrole groups; the resulting molecule would not 
be as large as bis-distamycin, but would fundamentally 
15 be composed of the same component organic moieties that 
comprise the parent molecule. Examples of a hetero- 
oligomeric compounds would be daunomycin linked to one 
or two N-methylpyrrole groups. 

The construction of these polymers will be 
20 directed by the information derived from the sequence 
preferences of tha parent compounds tested in the 
assay. In one embodiment of the assay, a database of 
preferred sequences is constructed, providing a source 
of inf ozonation about the 4 bp sequences that bind with 
25 relatively higher affinity to particular drugs that may 
be linked together to target any particular larger DNA 
sequence. 

DNA-binding subunits can be chemically coupled to 
form heteropolymers or homopolymers. The subunits can 

30 be joined directly to each other, as in the family of 
distamycin molecules, or the subunits can be joined 
with a spacer molecule, such as carbon chains or 
peptide bonds. The coupling of subunits is dependent 
on the chemical nature of the subunits: appropriate 

35 coupling reactions can be determined for any two 
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subunit molecules from the chemical literature. The 
choice of subunits will be directed by the sequence to 
be targeted and the data accumulated through the 
methods discussed in Section VI. B of this application. 

5 

2 . Sequence-Specific DNA-Bindincr Molecules 
Identified in the Assay as Facilitators 
of Triplex Formation , 

Several types of nucleic acid base-containing 
10 polymers have been described that will form complexes 
with nucleic acids (for reviews, see Helene, C. and 
Toulme, J. -J.)- One type of such a polymer forms a 
triple-stranded complex by the insertion of a third 
strand into the major groove of the DNA helix. Several 
15 types of base-recognition specific interactions of 
third strand oligonucleotide-type polymers have been 
observed. One type of specificity is due to Hoogsteen 
bonding (Hoogsteen) . This specificity arises from 
recognition between pyrimidine oligonucleotides and 
20 double-stranded DNA by pairing thymine and ade- 
nine: thymine base pairs and protonated cytosine and 
guanine: cytosine base pairs (Griffin, et al.) • Another 
type of specific interaction involves the use of purine 
oligonucleotides for triplex formation ♦ In these 
25 triplexes, adenine pairs with adenine: thymine base 
pairs and guanine with guanine: cytosine (Cooney, et 
al.; Beal and Dervan) or thymine: adenine base par is 
(Griffen, L. , and Dervan, P.B.). 

Other motifs for triplex formation have been 
30 described, including the incorporation of nucleic acid 
analogs ( eg , methy lphosphonates , phosphorothioates ; 
Miller, et al . ) , and the invention of backbones other 
than the phosphoribose backbones normally found in 
nucleic acids (Pitha, et al . ; Summerton, et al.)- In 
3 5 several cases, the formation of triplex has been 
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demonstrated to inhibit the binding of a DNA-binding 
protein (e.g., Young, et al . ; Maher, et al . ) or the 
expression of a cellular protein (Cooney, et al.). 

Furthermore, several experiments have been 
5 reported in which a small DNA-binding molecule has been 
covalently attached to polymer capable of forming a 
triplex structure: (i) an acridineipolypyrimidine 
molecule has been demonstrated to inhibit SV40 in CV-l 
cells (Birg r et al.); (ii) cleavage at a single site in 

10 a yeast chromosome was achieved with an oligonucleo- 
tide :EDTA-Fe molecule (Strobel, et al.; Dervan) ; and 
(iii) a photo inducible endonuclease was created by 
similar strategy by attaching an ellipticine derivative 
to a homopyrimidine oligonucleotide (Perouault, et 

15 al.) . Several other small intercalating agents coupled 
to oligonucleotides have been described (for review, 
see Montenay-Garestier) . 

One utility of the assay of the present invention 
is to identify the sequence-specificity of DNA-binding 

20 molecules for use in designing and synthesizing 
heteromeric therapeutics consisting of a DNA-binding 
polymer (e.g., an oligonucleotide) attached to a se- 
quence-preferential or sequence-specific DNA-binding 
molecule, yielding a heteropolymer . The attached small 

25 molecule may serve several functions. 

First, if the molecule has increased affinity for 
a specific site (such as, a particular 4 base pair se- 
quence) over all other sites of the same size, then the 
local concentration of the hetero-molecule, including 

30 the oligonucleotide, will be increased at those sites. 
The amount of heteropolymer , containing a sequence- 
specific moiety attached to one end, needed for 
treatment purposes is reduced compared to a heteropoly- 
mer that has a non-specific DNA-binding moiety at- 

35 tached. This reduction in treatment amount is directly 
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proportional to both the differential specificity and 
the relative affinities between the sequence-specific 
binder and the non-specific binder. For the simplest 
example , if a sequence-specific molecule with absolute 
5 specificity (i.e., it binds only one sequence) had 
equal affinity for a specific 4 base-pair target site 
(1/256 possible combinations) as a non-specific mole- 
cule, then the amount of drug needed to exert the same 
effective concentration at that site could potentially 

10 be as much as 256-fold less for the specific and non- 
specific drugs. Accordingly , attaching a sequence- 
specific DNA-binding molecule to a polymer designed to 
form triplex structures allows increased localized 
concentrations . 

15 A second utility of the assay of the present 

invention is to identify small molecules that cause 
conformational changes in the DNA when they bind. The 
formation of triplex DNA requires a shift from B form 
to A form DNA. This is not energetically favorable, 

20 necessitating the use of increased amounts of polymer 
for triplex formation to drive the conformational 
change. However, the insertion of a small DNA-binding 
molecule (such as, actinomycin D) , which induces a 
conformational change in the DNA, reduces the amount of 

25 polymer needed to stabilize triplex formation. 

Accordingly, one embodiment of the invention is to 
use the assay to test known DNA-binding molecules with 
all 256 possible four base pair test sequences to 
determine the relative binding affinity to all possible 

30 4 bp sequences. Then, once the sequence preferences 
are known, the information may be used to design 
hetercpolymeric molecules comprised of a small DNA- 
binding molecule and a macromolecule, such as a 
triplex-forming oligonucleotide, to obtain a DNA- 

35 binding molecule with enhanced binding characteristics. 
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The potential advantages of attaching a sequence- 
specific or sequence-preferential DNA-binding small 
molecule to a triplex forming molecule are to (i) 
target the triplex to a subset of specific DNA se- 
5 quences and thereby (ii) anchor the triplex molecule 
in the vicinity of its target sequence and in doing so, 
(iii) increase the localized concentration of the 
triplex molecule, which allows (iv) lower concentra- 
tions of triplex to be used effectively. The presence 

10 of the small molecule may also facilitate localized 
perturbations in DNA structure, such as destabilizing 
the B form of DNA, which is unsuitable for triplex 
formation. Such destabalization may facilitate the 
formation of other structures, such a form DNA useful 

15 for triplex formation. The net effect would be to 
decrease the amount of triplex needed for efficacious 
results. 

Other Applications . 

20 The potential pharmaceutical applications for se- 

quence-specific DNA-binding molecules are very broad, 
including antiviral, antifungal, antibacterial, 
antitumor agents, immunosuppressants, and cardiovascu- 
lar drugs. Sequence-specific DNA-binding molecules can 

25 also be useful as molecular reagents as, for example, 
specific sequence probes. 

As more DNA-binding molecules are detected, 
information about their DNA binding affinities, se- 
quence recognition, and mechanisms of DNA-binding will 

30 be gathered, eventually facilitating the design and/or 
modification of new molecules with different or 
specialized activities. 

Although the assay has been described in terms of 
the detection of sequence-specific DNA-binding mole- 

3 5 cules, the reverse assay could be achieved by adding 
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to look for peptide sequence 
inhibitors. 



The following examples illustrate, but in no way 
5 are intended to limit, the present invention. 

Materials and Methods 
Synthetic oligonucleotides were prepared using 
commercially available automated oligonucleotide 
10 synthesizers. Alternatively, custom designed synthetic 
oligonucleotides may be purchased, for example, from 
Synthetic Genetics (San Diego, CA) . Complementary 
strands were annealed to generate double-strand oligo- 
nucleotides. 

15 Restriction enzymes were obtained from Boehringer 

Mannheim (Indianapolis IN) or New England Biolabs 
(Beverly MA) and were used as per the manufacturer's 
directions. 

Distamycin A and Doxorubicin were obtained from 
20 Sigma (St. Louis, MO), Actinomycin D was obtained from 
Boehringer Mannheim or Sigma. 

Standard cloning and molecular biology techniques 
are described in Ausubel, et al., and Sambrook, et al. 



25 Example 1 

Preparation of the Oligonucleotide 
Containing the Screening Sequence 

This example describes the preparation of (A) 

biotinylated/digoxigenin/radiolabeled, and (B) radio- 

30 labeled double-stranded oligonucleotides that contain 

the screening sequence and selected Test sequences. 



35 



A. Biotinylation . 

The oligonucleotides were prepared as described 
above. The wild-type control sequence for the UL9 
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binding site, as obtained from HSV, is shown in Figure 
4. The screening sequence, i.e. the UL9 binding se- 
quence, is CGTTCGCACTT (SEQ ID NO: 601) and is under- 
lined in Figure 4. Typically, sequences 5' and/or 3' 
5 to the screening sequence were replaced by a selected 
Test sequence (Figure 5) . 

One example of the preparation of a site-specifi- 
cally biotinylated oligonucleotide is outlined in 
Figure 4, An oligonucleotide primer complementary to 

10 the 3' sequences of the screening sequence-containing 
oligonucleotide was synthesized. This oligonucleotide 
terminated at the residue corresponding to the C in 
position 9 of the screening sequence. The primer oli- 
gonucleotide was hybridized to the oligonucleotide 

15 containing the screening sequence. Biotin-ll-dUTP 
(Bethesda Research Laboratories (BRL) , Gaithersburg MD) 
and Klenow enzyme were added to this complex (Figure 4) 
and the resulting partially double-stranded biotinyla- 
ted complexes were separated from the unincorporated 

20 nucleotides using either pre-prepared "G-25 SEPHADEX" 
spin columns (Pharmacia, Piscataway NJ) or "NENSORB" 
columns (New England Nuclear) as per manufacturer's 
instructions. The remaining single-strand region was 
converted to double-strands using DNA polymerase I 

25 Klenow fragment and dNTPs resulting in a fully double- 
stranded oligonucleotide. A second "G-25 SEPHADEX" 
column was used to purify the double-stranded oligonu- 
cleotide. Oligonucleotides were diluted or resuspended 
in 10 mM Tris-HCl, pH 7.5, 50 mM NaCl, and 1 mM EDTA 

3 0 and stored at -20° C. For radiolabelling the complexes, 
32 P-alpha-dCTP (New England Nuclear, Wilmington, DE) 
replaced dCTP for the double-strand completion step. 

Alternatively, the top strand, the primer, or the 
fully double-stranded oligonucleotide have been 

35 radiolabeled with 7~ 32 P-ATP and polynucleotide kinase 
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(NEB, Beverly, MA) . Most of our preliminary studies 
have employed radiolabeled, double-stranded oligonucle- 
otides. The oligonucleotides are prepared by 
radiolabeling the primer with T4 polynucleotide kinase 
5 and 7- 32 P-ATP, annealing the "top" strand full length 
oligonucleotide, and "f illing-in" with Klenow fragment 
and deoxynucleotide triphosphates. After phosphoryla- 
tion and second strand synthesis, oligonucleotides are 
separated from buffer and unincorporated triphosphates 

10 using "G-25 SEPHADEX" preformed spin columns (IBI, New 
Haven, CT or Biorad, Richmond CA) . This process is 
outlined in Figure 4. The reaction conditions for all 
of the above Klenow reactions were as follows: 10 mM 
Tris-HCl, pH 7.5, 10 mM MgCl 2 , 50 mM NaCl, 1 mM 

15 dithioerythritol, 0.33-100 fxK deoxytriphosphates , 2 
units Klenow enzyme (Boehringer-Mannheim, Indianapolis 
IN). The Klenow reactions were incubated at 25 °C for 
15 minutes to 1 hour. The polynucleotide kinase 
reactions were incubated at 37 °C for 3 0 minutes to 1 

20 hour. 

B. End-Labelina with Digoxigenin . 
The biotinylated, radiolabeled oligonucleotides 
or radiolabeled oligonucleotides were isolated as above 

25 and resuspended in 0.2 M potassium cacodylate (pH=7.2) , 
4 mM MgCl 2 , l mM 2-mercaptoethanol, and 0.5 mg/ml 
bovine serum albumin. To this reaction mixture 
digoxigenin- 11-dUTP (an analog of dTTP, 2'-deoxy- 
uridine-5' -triphosphate, coupled to digoxigenin via an 

30 11-atom spacer arm, Boehringer Mannheim, Indianapolis 
IN) and terminal deoxynucleotidyl transferase (GIBCO 
BRL, Gaithersburg, MD) were added. The number of Dig- 
11-dUTP moieties incorporated using this method 
appeared to be less than 5 (probably only 1 or 2) as 

35 judged by electrophoretic mobility on polyacrylamide 
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gels of the treated fragment as compared to oligonucle- 
otides of known length. 

The biotinylated or non-biotinylated, digoxygenin- 
containing, radiolabelled oligonucleotides were 
5 isolated as above and resuspended in 10 mM Tris-HCl, 1 
mM EDTA, 50 mM NaCl, pH 7.5 for use in the binding 
assays. 

The above procedure can also be used to biotiny- 
late the other strand by using an oligonucleotide 

10 containing the screening sequence complementary to the 
one shown in Figure 4 and a primer complementary to the 
3' end of that molecule. To accomplish the biotinyla- 
tion Biotin-7-dATP was substituted for Biotin-ll-dUTP. 
Biotinylation was also accomplished by chemical 

15 synthetic methods: for example, an activated nucleotide 
is incorporated into the oligonucleotide and the active 
group is subsequently reacted with NHS-LC-Biotin 
(Pierce) . Other biotin derivatives can also be used. 

20 c. Radiolabellincr the Oligonucleotides . 

Generally, oligonucleotides were radiolabelled 
with gamma-^P-ATP or alpha- 32 P-deoxynucleotide triphos- 
phates and T4 polynucleotide kinase or the Klenow 
fragment of DNA polymerase, respectively. Labelling 

25 reactions were performed in the buffers and by the 
methods recommended by the manufacturers (New England 
Biolabs, Beverly MA; Bethesda Research Laboratories, 
Gaithersburg MD; or Boehringer/Mannheim, Indianapolis 
IN) . Oligonucleotides were separated from buffer and 

30 unincorporated triphosphates using "G-25 SEPHADEX" 
preformed spin columns (IBI, New Haven, CT; or Biorad, 
Richmond, CA) or "NENSORB" preformed columns (New 
England Nuclear, Wilmington, DE) as per the manufactur- 
ers instructions. 
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There are several reasons to enzymatically 
synthesize the second strand* The two main reasons are 
that by using an excess of primer, second strand 
synthesis can be driven to near completion so that 
5 nearly all top strands are annealed to bottom strands, 
which prevents the top strand single strands from 
folding back and creating additional and unrelated 
double-stranded structures, and secondly, since all of 
the oligonucleotides are primed with a common primer, 
10 the primer can bear the end-label so that all of the 
oligonucleotides will be labeled to exactly the same 
specific activity. 

Example 2 

15 Preparation of the UL9 Protein 

A. Cloning of the UL9 Protein-Coding Sequences 
into PAC3 73 . 

To express full length UL9 protein a baculovirus 
expression system has been used. The sequence of the 

20 UL9 coding region of Herpes Simplex Virus has been 
disclosed by McGeoch et al . and is available as an EMBL 
nucleic acid sequence. The recombinant baculovirus 
AcNPV/UL9A, which contained the UL9 protein-coding se- 
quence, was obtained from Mark Challberg (National 

25 Institutes of Health, Bethesda MD) • The construction 
of this vector has been previously described (Olivo, et 
al. (1988, 1989)). Briefly, the Narl/EcoRV fragment 
was derived from pMC160 (Wu, et al.). Blunt-ends were 
generated on this fragment by using all four dNTPs and 

30 the Klenow fragment of DNA polymerase I (Boehringer 
Mannheim, Indianapolis IN) to fill in the terminal 
overhangs. The resulting fragment was blunt-end 
ligated into the unique BamHI site of the baculoviral 
vector pAC3T3 (Summers, et al.). 



35 
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B. Cloning of the UL9 Sequence in pVL1393 . 
The UL9 protein-coding region was cloned into a 
second baculovirus vector, pVL1393 (Luckow, et al . ) . 
The 3077 bp Narl/EcoRV fragment containing the UL9 gene 
5 was excised from vector pEcoD (obtained from Dr, Bing 
Lan Rong, Eye Research Institute, Boston, MA) : the 
plasmid pEcoD contains a 16*2 kb EcoRI fragment derived 
from HSV-I that bears the UL9 gene (Goldin, et al.)* 
Blunt-ends were generated on the UL9-containing 

10 fragment as described above. EcoRI linkers (10 mer) 
were blunt-end ligated (Ausubel, et al.; Sambrook, et 
al.) to the blunt-ended Narl/EcoRV fragment. 

The vector pVL1393 (Luckow, et al . ). was digested 
with EcoRI and the linearized vector isolated. This 

15 vector contains 35 nucleotides of the 5' end of the 
coding region of the polyhedron gene upstream of the 
polylinker cloning site. The polyhedron gene ATG has 
been mutated to ATT to prevent translational initiation 
in recombinant clones that do not contain a coding se- 

20 guence with a functional ATG. The EcoRI /UL9 fragment 
was ligated into the linearized vector, the ligation 
mixture transformed into E. coli and ampicillin 
resistant clones selected. Plasmids recovered from the 
clones were analyzed by restriction digestion and 

25 plasmids carrying the insert with the amino terminal 
UL9 protein-coding sequences oriented to the 5' end of 
the polyhedron gene were selected. This plasmid was 
designated pVL1393/UL9 (Figure 7). 

pVL1393/UL9 was cotransf ected with wild-type 

30 baculoviral DNA (AcMNPV; Summers, et al.) into SF9 
(Spodoptera frugiperda) cells (Summers, et al . ) . 
Recombinant baculovirus- infected Sf9 cells were 
identified and clonally purified (Summers, et al.). 
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C. Expression of the UL9 Protein , 

Clonal isolates of recombinant baculovirus 
infected Sf9 cells were grown in Grace's medium as 
described by Summers, et al . The cells were scraped 
5 from tissue culture plates and collected by centrif uga- 
tion (2,000 rpm, for 5 minutes, 4°C). The cells were 
then washed once with phosphate buffered saline (PBS) 
(Maniatis, et al.) . Cell pellets were frozen at -70°C 
For lysis the cells were resuspended in 1.5 volumes 20 

10 mM HEPES , pH 7.5, 10% glycerol, 1.7 M NaCl, 0.5 mM 
EDTA, 1 mM dithiothreitol (DTT) , and 0.5 mM phenyl 
methyl sulf onyl fluoride (PMSF) . Cell lysates were 
cleared by ultracentrifugation (Beckman table top 
ultracentrifuge, TLS 55 rotor, 34 krpm, 1 hr, 4°C). 

15 The supernatant was dialyzed overnight at 4°C against 
2 liters dialysis buffer (20 mM HEPES, pH 7.5, 10% 
glycerol, 50 mM NaCl, 0.5 mM EDTA, 1 mM dtt, and 0.1 mM 
PMSF) . 

These partially purified extracts were prepared 
20 and used in DNA: protein binding experiments. If 
necessary extracts were concentrated using a "CENTRICON 
30 M filtration device (Amicon, Danvers MA) . 

D. Cloning the Truncated UL9 Protein . 

25 The sequence encoding the C-terminal third of UL9 

and the 3' flanking sequences, an approximately 1.2 kb 
fragment, was subcloned into the bacterial expression 
vector, pGEX-2T (Figure 6) . The pGEX-2T is a modifica- 
tion of the pGEX-1 vector of Smith, et al . which 

3 0 involved the insertion of a thrombin cleavage sequence 
in-frame with the glutathione-S-transf erase protein 
(gst) . 

A 1,194 bp BamHI/EcoRV fragment of pEcoD was 
isolated that contained a 951 bp region encoding the c- 
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terminal 317 amino acids of UL9 and 243 bp of the 3' 
untranslated region* 

This BamHI /EcoRV UL9 carboxy-terminal (UL9-COOH) 
containing fragment was blunt-ended and EcoRI linkers 
5 added as described above. The EcoRI linkers were 
designed to allow in-frame fusion of the UL9 protein- 
coding sequence to the gst-thrombin coding sequences. 
The linkered fragment was isolated and digested with 
EcoRI. The pGEX-2T vector was digested with EcoRI, 

10 treated with Calf Intestinal Alkaline Phosphatase (CIP) 
and the linear vector isolated. The EcoRI linkered 
UL9-COOH fragment was ligated to the linear vector 
(Figure 6) • The ligation mixture was transformed into 
E. coli and ampicillin resistant colonies were select- 

15 ed. Plasmids were isolated from the ampicillin 
resistant colonies and analyzed by restriction enzyme 
digestion. A plasmid which generated a gst/thrombin/- 
UL9-COOH in frame fusion was identified (Figure 6) and 
designated pGEX-2T/UL9-COOH. 

20 

E. Expression of the Truncated UL9 Protein . 
E. coli strain JM109 was transformed with pGEX- 
2T/C-UL9-COOH and was grown at 37 °C to saturation 
density overnight. The overnight culture was diluted 
25 1:10 with LB medium containing ampicillin and grown 
from one hour at 30 °C. IPTG (isopropyllthio-jS-galacto- 
side) (GIBCO-BRL) was added to a final concentration of 
0.1 mM and the incubation was continued for 2-5 hours. 
Bacterial cells containing the plasmid were subjected 
30 to the temperature shift and IPTG conditions, which 
induced transcription from the tac promoter. 

Cells were harvested by centrifugation and 
resuspended in 1/100 culture volume of MTPBS (150 mM 
NaCl, 16 mM Na 2 HP0 4/ 4 mM NaHjPOJ . Cells were lysed by 
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sonication and lysates cleared of cellular debris by 
centrif ugation . 

The fusion protein was purified over a glutathione 
agarose affinity column as described in detail by 
5 Smith , et al * The fusion protein was eluted from the 
affinity column with reduced glutathione, dialyzed 
against UL9 dialysis buffer (20 mM HEPES pH 7.5, 50 mM 
NaCl, 0.5 mM EDTA, 1 mM DTT, 0.1 mM PMSF) and cleaved 
with thrombin (2 ng/ug of fusion protein) . 

10 An aliquot of the supernatant obtained from IPTG- 

induced cultures of pGEX-2T/C-UL9-COOH-containing cells 
and an aliquot of the affinity-purified, thrombin- 
cleaved protein were analyzed by SDS-polyacrylamide gel 
electrophoresis. The result of this analysis is shown 

15 in Figure 8. The 63 kilodalton GST/C-UL9 fusion 
protein is the largest band in the lane marked GST-UL9 
(lane 2) . The first lane contains protein size 
standards. The UL9-C00H protein band (lane GST-UL9 + 
Thrombin, Figure 8, lane 3) is the band located between 

20 30 and 46 kD: the glutathione transferase protein is 
located just below the 30 kD size standard. In a 
separate experiment a similar analysis was performed 
using the uninduced culture: it showed no protein 
corresponding in size to the fusion protein, 

25 Extracts are dialyzed before use. Also, if 

necessary, the extracts can be concentrated typically 
by filtration using a "CENTRICON 30" filter. 

Example 3 

30 Binding Assays 

A. Band Shift Gels . 

DNA: protein binding reactions containing both 
labelled complexes and free DNA were separated electro- 
phoretically on 4-10% polyacrylamide/Tris-Borate-EDTA 
35 (TBE) gels (Fried, et al.; Garner, et a J . ) . The gels 
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were then fixed, dried, and exposed to X-ray film. The 
autoradiograms of the gels were examined for band shift 
patterns, 

5 B. Filter Binding Assays , 

A second method used particularly in determining 
the half -lives for oligonucleotide: protein complexes is 
filter binding (Woodbury, et al . ) . Nitrocellulose 
disks (Schleicher and Schuell, BA85 filters) that have 

10 been soaked in binding buffer (see below) were placed 
on a vacuum filter apparatus. DNA: protein binding 
reactions (see below; typically 15-30 fxl) are diluted 
to 0.5 ml with binding buffer (this dilutes the 
concentration of components without dissociating 

15 complexes) and applied to the discs with vacuum 
applied. Under low salt conditions the DNA:protein 
complex sticks to the filter while free DNA passes 
through. The discs are placed in scintillation 
counting fluid (New England Nuclear) , and the cpm 

20 determined using a scintillation counter. 

This technique has been adapted to 96-well and 72- 
slot nitrocellulose filtration plates (Schleicher and 
Schuell) using the above protocol except (i) the 
reaction dilution and wash volumes are reduced, and 

25 (ii) the flow rate through the filter is controlled by 
adjusting the vacuum pressure. This method greatly 
facilitates the number of assay samples that can be 
analyzed. Using radioactive oligonucleotides, the 
samples are applied to nitrocellulose filters, the 

3 0 filters are exposed to x-ray film, then analyzed using 
a Molecular Dynamics scanning densitometer. This 
system transfers data directly into analytical 
software programs (e.g., Excel) for analysis and 
graphic display. 



35 
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Example 4 
Functional UL9 Binding Assay 
A. Functional DNA-Bindina Activity Assay , 
Purified protein was tested for functional 
5 activity using band-shift assays. Radiolabelled oligo- 
nucleotides (prepared as in Example IB) that contain 
the 11 bp recognition sequence were mixed with the UL9 
protein in binding buffer (optimized reaction condi- 
tions: 0.1 ng 32 P-DNA, 1 ul UL9 extract, 20 mM HEPES, 
10 pH 7.2, 50 mM KC1, and 1 mM DTT) . The reactions were 
incubated at room temperature for 10 minutes (binding 
occurs in less than 2 minutes) , then separated electro- 
phoretically on 4-10% non-denaturing polyacrylamide 
gels. UL9-specific binding to the oligonucleotide is 
15 indicated by a shift in mobility of the oligonucleotide 
on the gel in the presence of the UL9 protein but not 
in its absence. Bacterial extracts containing (+) or 
without (-) UL9 protein and affinity purified UL9 
protein were tested in the assay. Only bacterial 
2 0 extracts containing UL9 or affinity purified UL9 
protein generate the gel band-shift indicating protein 
binding. 

The degree of extract that needed to be added to 
the reaction mix, in order to obtain UL9 protein excess 

25 relative to the oligonucleotide, was empirically 
determined for each protein preparation/extract. 
Aliquots of the preparation were added to the reaction 
mix and treated as above. The quantity of extract at 
which the majority of the labelled oligonucleotide 

30 appears in the DNA; protein complex was evaluated by 
band-shift or filter binding assays. The assay is most 
sensitive under conditions in which the minimum amount 
of protein is added to bind most of the DNA. Excess 
protein decreases the sensitivity of the assay with 

3 5 respect to the ability of inhibitors to compete with 
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the protein for oligonucleotide binding, except when 
protein concentrations are so high that non-specific 
protein/DNA binding is provoked. 

5 B. Rate of Dissociation . 

The rate of dissociation is determined using a 
competition assay. An oligonucleotide having the se- 
quence presented in Figure 4, which contained the 
binding site for UL9 (SEQ ID NO:614), was radiolabelled 

10 with 32 P-ATP and polynucleotide kinase (Bethesda 
Research Laboratories) . The competitor DNA was a 17 
base pair oligonucleotide (SEQ ID NO: 616) containing 
tfte binding site for UL9. 

In the competition assays, the binding reactions 

15 (Example 4A) were assembled with each of the oligonu- 
cleotides and placed on ice. Unlabelled oligonucleo- 
tide (1 pg) was added 1, 2, 4, 6, or 21 hours before 
loading the reaction on an 8% polyacrylamide gel (run 
in TBE buffer (Maniatis, et al . ) ) to separate the 

20 reaction components. The dissociation rates, under 
these conditions, for the truncated UL9 (UL9-COOH) and 
the full length UL9 is approximately 4 hours at 4°C 
In addition, random oligonucleotides (a 10,000-fold 
excess) that did not contain the UL9 binding sequence 

25 and sheared herring sperm DNA (a 100,000-fold excess) 
were tested: neither of these control DNAs competed 
for binding with the oligonucleotide containing the UL9 
binding site. 

30 c. Optimization of the UL9 Binding Assay . 

1. Truncated UL9 from the Bacterial Ex- 
pression System . 

The effects of the following components on 

the binding and dissociation rates of UL9-COOH with its 

35 cognate binding site have been tested and optimized: 
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buffering conditions (including the pH, type of buffer, 
and concentration of buffer) ; the type and concentra- 
tion of monovalent cation; the presence of divalent 
cations and heavy metals; temperature; various 
5 polyvalent cations at different concentrations; and 
different redox reagents at different concentrations* 
The effect of a given component was evaluated starting 
with the reaction conditions given above and based on 
the dissociation reactions described in Example 4B. 

10 The optimized conditions used for the binding of 

UL9-COOH contained in bacterial extracts (Example 2E) 
to oligonucleotides containing the HSV ori sequence 
(SEQ ID NO: 601) were as follows: 20 mM HEPES, pH 7.2, 
50 mM KC1, 1 mM DTT, 0*005 » 0,1 ng radiolabeled 

15 (specific activity, approximately 10* cpm/^g) or 
digoxiginated, biotinylated oligonucleotide probe, and 
5-10 fig crude UL9-C00H protein preparation (1 mM EDTA 
is optional in the reaction mix) . Under optimized 
conditions, UL9-C0OH binds very rapidly and has a 

20 dissociation rate of about 4 hours at 4°C with non- 
biotinylated oligonucleotide and 5-10 minutes with 
biotinylated oligonucleotides. The dissociation rate 
of UL9-COOH changes markedly under different physical 
conditions. Typically, the activity of a UL9 protein 

25 preparation was assessed using the gel band-shift assay 
and related to the total protein content of the extract 
as a method of standardization. The addition of 
herring sperm DNA depended on the purity of UL9 used in 
the experiment Binding assays were incubated at 25 °C 

30 for 5-30 minutes. 

2. Full Length UL9 Protein from the Bacu- 
lovirus System . 

The binding reaction conditions for the full 

35 length baculovirus-produced UL9 polypeptide have also 
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been optimized. The optimal conditions for the current 
assay were determined to be as follows: 20 mM Hepes; 
100 mM NaCl; 0.5 mM dithiothreitol ; 1 mM EDTA; 5% 
glycerol; from 0 to 10 4 -fold excess of sheared herring 
5 sperm DNA; 0.005 - 0.1 ng radiolabeled (specific 
activity, approximately 10 8 cpm/Mg) or digoxiginated, 
biotinylated oligonucleotide probe, and 5-10 crude 
UL9 protein preparation. The full length protein also 
binds well under the optimized conditions established 
10 for the truncated UL9-C00H protein. 

Example 5 

The Effect of Test Sequence Variation on the 
Half -Life of the UL9 DNA: Protein Complex 

15 The oligonucleotides shown in Figure 5 were 

radiolabelled as described above. The competition 

assays were performed as described in Example 4B using 

UL9-COOH. Radiolabelled oligonucleotides were mixed 

with the UL9-COOH protein in binding buffer (typical 

20 reaction: 0.1 ng oligonucleotide 32 P-DNA, 1 pi UL9- 
COOH extract, 20 mM HEPES , pH 7.2, 50 mM KC1, 1 mM 
EDTA, and 1 mM DTT) . The reactions were incubated at 
room temperature for 10 minutes. A zero time point 
sample was then taken and loaded onto an 8% polyacryl- 

25 amide gel (run use TBE) . One /xg of the unlabelled 17 
bp competitive DNA oligonucleotide (SEQ ID NO: 616) 
(Example 4B) was added at 5, 10, 15, 20, or 60 minutes 
before loading the reaction sample on the gel. The 
results of this analysis are shown in Figure 9: the 

30 screening sequences that flank the UL9 binding site 
(SEQ ID NO:605-SEQ ID NO: 613) are very dissimilar but 
have little effect on the off -rate of UL9. According- 
ly, these results show that the UL9 DNA binding protein 
is effective to bind to a screening sequence in duplex 

35 DNA with a binding affinity that is substantially 



WO 94/14980 



PCT/US93/12388 



170 

independent of test sequences placed adjacent the 
screening sequence. Filter binding experiments gave 
the same result. 

5 Example 6 

The Effect of Actinomycin D, Distamvcin A, and 
Doxorubicin on UL9 Binding to the screening Sequence 
is Dependent on the Specific Test Sequence 

Different oligonucleotides, each of which con- 

10 tained the screening sequence (SEQ ID NO: 601) flanked 

on the 5' and 3' sides by a test sequence (SEQ ID 

NO: 605 to SEQ ID NO: 613), were evaluated for the 

effects of distamycin A, actinomycin D, and doxorubicin 

on UL9-COOH binding. 

15 Binding assays were performed as described in 

Example 5. The oligonucleotides used in the assays are 

shown in Figure 5. The assay mixture was allowed to 

pre-equilibrate for 15 minutes at room temperature 

prior to the addition of drug. 

20 A concentrated solution of Distamycin A was 

prepared in dH 2 0 and was added to the binding reactions 

at the following concentrations: 0, 1 MM, 4 iM, 16 jiM, 

and 4 0 fiK. The drug was added and incubated at room 

temperature for 1 hour. The reaction mixtures were 

25 then loaded on an 8% polyacrylamide gel (Example 5) and 

the components separated electrophoretically . Autora- 

diographs of these gels are shown in Figure 10A* The 

test sequences tested were as follows: UL9 polyT, SEQ 

ID NO: 609; UL9 CCCG, SEQ ID NO: 605; UL9 GGGC, SEQ ID 

30 NO: 606; UL9 polyA, SEQ ID NO: 608; and UL9 ATAT, SEQ ID 

NO: 607. These results demonstrate that Distamycin A 

preferentially disrupts binding to UL9 polyT, UL9 polyA 

and UL9 ATAT. 

A concentrated solution of Actinomycin D was 

35 prepared in dH 2 0 and was added to the binding reactions 
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at the following concentrations: 0 jiM and 50 mM. The 
drug was added and incubated at room temperature for 1 
hour. Equal volumes of dt^O were added to the control 
samples* The reaction mixtures were then loaded on an 
5 8% polyacrylamide gel (Example 5) and the components 
separated electrophoretically . Autoradiographs of 
these gels are shown in Figure 10B. In addition to the 
test sequences tested above with Distamycin A, the 
following test sequences were also tested with Actino- 
10 mycin D: AToril, SEQ ID NO: 611; oriEco2, SEQ ID 
NO: 612 , and oriEco3, SEQ ID NO: 613 . These results 
demonstrate that actinomycin D preferentially disrupts 
the binding of UL9 to the oligonucleotides UL9 CCCG and 
UL9 GGGC. 

15 A concentrated solution of* Doxorubicin was 

prepared in dH 2 0 and was added to the binding reactions 
at the following concentrations: 0 /iM, 15 /iM and 35 jiM. 
The drug was added and incubated at room temperature 
for 1 hour. Equal volumes of dHjO were added to the 

20 control samples. The reaction mixtures were then loaded 
on an 8% polyacrylamide gel (Example 5) and the 
components separated electrophoretically. Autoradio- 
graphs of these gels are shown in Figure 10C. The same 
test sequences were tested as for Actinomycin D. These 

25 results demonstrate that Doxorubicin preferentially 
disrupts the binding of UL9 to the oligonucleotides 
UL9polyT, UL9 GGGC, oriEco2 f and oriEco3. Doxorubicin 
appears to particularly disrupt the UL9 : screening se- 
quence interaction when the test sequence oriEco3 is 

3 0 used. The sequences of the test sequences for oriEco2 
and oriEco3 differ by only one base: an additional T 
residue inserted at position 12, compare SEQ ID NO: 612 
and SEQ ID NO: 613. 
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Example 7 

Use of the Biotin/Streptavidin Reporter System 
A. The Capture of Protein-Free DNA . 
Several methods have been employed to sequester 
5 unbound DNA from DNA: protein complexes. 
1. Magnetic Beads . 

Streptavidin-conjugated superparamagnetic 
polystyrene beads {Dynabeads M-280 Str eptavidin , Dynal 
AS, 6-7xl0 8 beads/ml) are washed in binding buffer then 

10 used to capture biotinylated oligonucleotides (Example 
1) . The beads are added to a 15 ul binding reaction 
mixture containing binding buffer anc biotinylated oli- 
gonucleotide. The beads/oligonucleotide mixture is 
incubated for varying lengths of time with the binding 

15 mixture to determine the incubation period to maximize 
capture of protein-free biotinylated oligonucleotides. 
After capture of the biotinylated oligonucleotide, the 
beads can be retrieved by placing the reaction tubes in 
a magnetic rack (96-well plate magnets are available 

20 from Dynal) . The beads are then washed. 



2 . Agarose Beads . 

Biotinylated agarose beads (immobilized D- 
biotin, Pierce, Rockford, IL) are bound to avidin by 

25 treating the beads with 50 ng//il avidin in binding 
buffer overnight at 4°C. The beads are washed in 
binding buffer and used to capture biotinylated DNA. 
The beads are mixed with binding mixtures to capture 
biotinylated DNA. The beads are removed by centrifuga- 

30 tion or by collection on a non-binding filter disc. 

For either of the above methods, quantification of 
the presence of the oligonucleotide depends on the 
method of labelling the oligonucleotide. If the oligo- 
nucleotide is radioactively labelled: (i) the beads 

35 and supernatant can be loaded onto polyacrylamide gels 
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to separate DNA: protein complexes from the bead:DNA 
complexes by electrophoresis, and autoradiography 
performed; (ii) the beads can be placed in scintilla- 
tion fluid and counted in a scintillation counter . 
5 Alternatively, presence of the oligonucleotide can be 
determined using a chemi luminescent or color imetric 
detection system. 

B. Detection of Protein-Free DNA . 

10 The DNA is end-labelled with digoxigenin-ll-dUTP 

(Example 1) „ The antigenic digoxigenin moiety is 
recognized by an antibody-enzyme conjugate, anti- 
digoxigenin-alkaline phosphatase (Boehringer Mannheim 
Indianapolis IN) . The DNA/antibody-enzyme conjugate is 

15 then exposed to the substrate of choice. The. presence 
of dig-dUTP does not alter the ability of protein to 
bind the DNA or the ability of streptavidin to bind 
biotin. 

20 1. Chemiluminescent Detection , 

Digoxigenin-labelled oligonucleotides are 
detected using the chemiluminescent detection system 
"SOUTHERN LIGHTS" developed by Tropix, Inc. (Bedford, 
MA) . Use of this detection system is illustrated in 

2 5 Figures HA and 11B. The technique can be applied to 

detect DNA that has been captured on either beads or 
filters. 

Biotinylated oligonucleotides, which have terminal 
digoxygenin-containing residues (Example 1) , are 

3 0 captured on magnetic (Figure 11A) or agarose beads 

(Figure 11B) as described above. The beads are 
isolated and treated to block non-specific binding by 
incubation with I-Light blocking buffer (Tropix) for 3 0 
minutes at room temperature. The presence of oligonu- 
35 elect ides is detected using alkaline phosphatase- 
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conjugated antibodies to digoxygenin. Anti-digoxi- 
genin-alkaline phosphatase (anti-dig-AP, 1:5000 
dilution of 0.75 units/ul, Boehringer Mannheim) is 
incubated with the sample for 3 0 minutes, decanted, and 
5 the sample washed with 100 mM Tris-HCl, pH 7.5, 150 mM 
NaCl. The sample is pre-equilibrated with 2 washes of 
50 mM sodium bicarbonate, pH 9.5, 1 M MgCl 2 , then 
incubated in the same buffer containing 0.25 mM 3- (2'- 
spiroadamantane) -4-methoxy-4- ( 3 ' -phosphoryloxy ) phenyl- 
10 1, 2-dioxetane disodium salt (AMPPD) for 5 minutes at 
room temperature. AMPPD was developed (Tropix Inc.) as 
a chemiluminescent substrate for alkaline phosphatase. 
Upon dephosphorylation of AMPPD the resulting compound 
decomposes, releasing a prolonged, steady emission of 
15 light at 477 nm. 

Excess liquid is removed from filters and the 
emission of light occurring as a result of the dephos- 
phorylation of AMPPD by alkaline phosphatase can be 
measured by exposure to x-ray film or by detection in 
20 a luminometer. 

In solution, the bead-DNA-anti-dig-AP is resuspen- 
ded in "SOUTHERN LIGHT" assay buffer and AMPPD and 
measured directly in a luminometer. Large scale 
screening assays are performed using a 96-well plate- 
25 reading luminometer (Dynatech Laboratories, Chantilly, 
VA) . Subpicogram quantities of DNA (10 2 to 10 3 atto- 
moles (an attomole is ic- 18 moles)) can be detected 
using the Tropix system in conjunction with the plate- 
reading luminometer. 

30 

2 . Color imetric Detection . 

Standard alkaline phosphatase colorimetric 
substrates are also suitable for the above detection 
reactions. Typically substrates include 4-nitrophenyl 
35 phosphate (Boehringer Mannheim) . Results of colorime- 
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trie assays can be evaluated in multiwell plates (as 
above) using a plate-reading spectrophotometer (Molecu- 
lar Devices, Menlo Park CA) • The use of the light 
emission system is more sensitive than the colorimetric 
5 systems • 

Example 8 

Labelling Test Oligonucleotides to 
Eguivalent Specific Activities 

10 The top strands of 256 oligonucleotides, contain- 

ing all possible 4 bp sequences in the test sites 
flanking the UL9 recognition site, were synthesized. 
The oligonucleotides were composed of identical se- 
quences except for the 4 bp sites flanking either side 

15 of the UL9 recognition sequence (SEQ ID No: 601). The 
oligonucleotides had the general sequence presented in 
Figure 14B (SEQ ID NO:617), where XXXX is the test se- 
quence and N - A,G,C, or T. A 12 bp primer sequence, 
which is the complementary sequence to the 3 '-end of 

20 the test oligonucleotide, was also synthesized: the 
primer was designated the HSV primer and is presented 
as SEQ ID NO: 618 . 

The HSV primer was used to prime second strand 
synthesis and to facilitate labeling the oligonucleo- 

25 tides to the same specific activity. Oligonucleotide 
labelling was accomplished by labeling the 5' end of 
the HSV primer and then using the same primer to prime 
second strand synthesis of all 256 test oligonucleo- 
tides. The 5' end of the primer can be labeled with 

30 radioisotopes such as 32 P, 33 P, or 35 S, or with non- 
radioactive detection systems such as digoxygenin or 
biotin as discussed in the capture /Detect ion section. 

Radioactive-labeling of the primer with 32 P is 
accomplished by the enzymatic transfer of a radioactive 

35 phosphate from y- 32 P-ATP to the 5' end of the primer 
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oligonucleotide using T4 polynucleotide kinase (Ausu- 
bel, et al.). For labeling 256 oligonucleotides, 
approximately 60 jig HSV primer was labeled as follows. 
The oligonucleotide was incubated or 1 hour at 37 °C 
5 with 125 Ml 7~ 32 P-ATP (20 mCi total, 7000 Ci/mmol) and 
600 units of T4 polynucleotide kinase in a 3 ml 
reaction volume containing 50 mM Tris-HCL, pH 7.5, 10 
mM MgCl 2 , 10 mM spermidine, and 1.5 mM dithiothreitol 
(freshly prepared) . To stop the reaction, EDTA was 

10 added to a final concentration of 20 mM. Unincorporat- 
ed nucleotides were removed using M G-25 SEPHADEX" 
chromatography in 10 mM Tris-HCL, pH 7.5, 50 mM NaCl, 
and 1 mM EDTA (TE+50) . 

The radioactive primer : :s individually annealed 

15 to the top strand of each of the 256 test oligonucleo- 
tides. The bottom strand is synthesized using 
deoxyribonucleotides and Klenow fragment or :?4 polymer- 
ase (Ausubel, et al . ) . The annealing mixture typically 
contained 200 ng HSV primer mixed with 1 M9 top strand 

20 in 20 mM Tris-HCL, pH 7.5, 1 mM spermidine, and 0.1 mM 
EDTA (35 Ml reaction volume) . The primer was annealed 
to the top strand by incubating the sample for 2 
minutes at 70 °C, then placing the sample at room 
temperature or on ice. To the annealing mixture, 4.5 

25 Ml 10x Klenow buffer (10X - 200 mM Tris-HCL, 500 mM 
NaCl, 50 mM MgCl 2 , 10 mM dithiothreitol), 5 Ml 0.5 mM 
each dNTP (dATP, dCTP, dGTP, dTTP) , and 1 Ml Klenow 
fragment were added. This reaction mixture was 
incubated 3 0-60 minutes at room temperature (or up to 

30 37°C). 

The volume of the reaction mixture was increased 
by adding 75 m! a solution of 10 mM Tris-HCl, pH 7.5, 
50 mM NaCl, and 10 mM EDTA. The reaction mixture was 
applied to a 1 ml "G-25 SEPHADEX" (in TE+50) spin 
35 column. The spin columns were prepared by plugging Ice 
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tuberculin syringes with silanized glass wool and 
adding a slurry of "G-25 SEPHADEX. " The columns were 
prespun at 2000 rpro in a tabletop centrifuge for 4 
minutes. The samples (reaction mixtures) were passed 
5 through the column by centrif ugation (2000 rpm, 4 
minutes at room temperature) to remove unincorporated 
deoxyribonucleotides . The incorporation of 32 P was 
measured by placing a small volume of the sample in 
scintillation f luor and determining the disintegrations 

10 per minute (dpms) in a scintillation counter. 

The radiolabeled double-stranded oligonucleotides 
were then diluted to the same specific activity (equal 
dpms per volume) . Typically, a concentration of 0.1 to 
1 ng/^1 oligonucleotide was used in the assay. 

15 The same procedure can be used for second strand 

synthesis and labeling to equal specific activity 
regardless of the type of label on the HSV primer. 

Example 9 

20 An Arrayed Sample Format 

Screening large numbers of test molecules or test 
sequences is most easily accomplished in an arrayed 
sample format, for example, a 96-well plate format* 
Such formats are readily amenable to automation using 

25 robotics systems. Several different types of dispos- 
able plastic plates are available for use in screening 
assays including the following: polyvinyl chloride 
(PVC) , polypropylene (PP) , polyethylene (PE) , and 
polystyrene (PS) plates. Plates, or any testing 

3 0 vehicle in which the assay is performed, are tested for 
protein and DNA adsorption and coated with a blocking 
reagent if necessary. 

One method for testing protein or DNA adsorption 
to plates is to place assay mixtures in the wells of 

35 the plates for varying lengths of time. Samples are 
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then removed from the wells and a nitrocellulose dot 
blot capture system (Ausubel, et al.; Schleicher and 
Schuell) is used to measure the amount of DNA: protein 
complex remaining in the mixture over time. 

When radiolabeled oligonucleotides are used for 
the test, signal can be measured using autoradiography 
and a scanning laser densitometer. A decrease in the 
amount of DNA .-protein complex in the absence of 
competitor molecules is indicative of plate adsorption. 
If plate adsorption occurs, the plates are coated with 
a blocking agent prior to use in the assay. 

None of the plates listed above showed marked 
adsorption at a 30 minute time point under the condi- 
tions of the assay. However, most plates, regardless 
15 of brand, showed significant adsorption at times 
greater than 2 hours. 

Coating the plates with a blocking agent decreases 
variability in the assay. Several types of blocking 
reagents typically used to block the adsorption of 
macromolecules to plastic are known, primarily from 
immunoscreening procedures. For example, plates may be 
blocked with either 1% bovine serum albumin (BSA) in 
phosphate-buffered saline (PBS), or 0.1% gelatin, 0.05% 
"TWEEN29" in PBS. 

25 To test f °r the effectiveness of using such 

blocking reagents, the plates were treated with the 
above reagents for 1 hour at room temperature, then 
washed three times with 0.05% M TWEEN20 M in PBS and once 
with the assay buffer. Assay reaction mixtures were 

30 aliquoted to the plates and tested as described above 
using dot blot capture assays. Both of the blocking 
reagents (BSA or gelatin) were effective in blocking 
DNA and protein binding — except when polypropylene 
plates were used. Based on these experiments, PVC 



20 
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plates blocked with BSA were determined to work well in 
the assay of the present invention. 

Plates were tested for inter- and intra-plate 
variability by aliquoting duplicate samples to all 96- 
wells of several plates, and determining the amount of 
DNA: protein complex recovered using the dot 
blot/nitrocellulose system. The coefficient of 
variation [%CV = (the standard deviation/mean) *100] was 
calculated for intra-plate variability (i.e., between 
samples on the same plate) and inter-plate variability 
(i.e., between plates). Blocked PVC plates showed an 
intra-plate %CV of 5-2 0%; inter-plate variability was 
about 8%. 



Example 10 

Sequence Selectivity and Relative Binding 
Affinity for Distamvcin 

Using the assay method of the present invention, 
distamycin was tested for sequence selectivity and 
relative binding affinity to 256 different 4 bp se- 
quences. 



A. The Assay Mixture . 

Water, buffer and UL9 were mixed on ice and 
aliquoted to the wells of a 96-well plate. The 
addition of water/UL9/buf f er mix was accomplished with 
an 8-channel repipettor, which holds a relatively large 
volume and allowed rapid, accurate pipetting to all 96 
wells of a master experimental plate. 

Radiolabeled double-stranded oligonucleotides were 
aliquoted from 96-well master stock plates (containing 
the array of all 256 oligonucleotides diluted to the 
same specific activity) to the wells of the master 
experimental plates. 
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Master assay mixtures in the master experimental 
plates were thoroughly mixed by pipetting up and down. 
The mixtures were aliquoted to the test plates. Each 
test plate typically included one sample as a control 
5 (no test molecules added) and as many test samples as 
were needed for different test molecules or test mole- 
cule concentrations ♦ There were 3 master oligonucleo- 
tide stock plates, containing the array of 256 oligonu- 
cleotides. Accordingly, an experiment testing distamy- 
10 cin at different concentrations would require 256 
control assays (one for each oligonucleotide) and 256 
assays at each of the drug concentrations to be tested. 

The following assay mixture was used for testing 
distamycin in the assay of the present invention: 1.5 
15 nM radiolabeled DNA and 12.8 nM UL9-COOH protein 
(prepared as described above in the UL9 binding buffer; 
20 mM Hepes, pH 7.2, 50 mM KCl, and 1 mM dithio- 
threitol) . The concentration of the components in the 
assay mixture can be varied as described above in the 
20 Detailed Description. 

Assay mixtures containing both UL9 and DNA were 
incubated at room temperature for at least 10 minutes 
to allow the DNA: protein complexes to form and for the 
system to come to equilibrium. At time = 0, the assay 
25 was begun by adding water (control samples) or distamy- 
cin (5-15 /iM, test samples) to the a&say mixtures using 
a 12 -channel micropipettor. After incubation with drug 
for 5-120 minutes, samples were taken and applied to 
nitrocellulose on a 96-well dot blot apparatus (Schlei- 
30 cher and Schuell) . The samples were held at 4°C 

Tests were performed in duplicate. Typically, one 
set of 256 test oligonucleotides was scrambled with 
respect to location on the 96-well plate to eliminat 
any effects of plate location. 



35 
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B. The Capture /Detection System . 

A 96-well dot blot apparatus was used to capture 
the DNA: protein complexes on a nitrocellulose filter. 
The filters used in the dot blot apparatus were 
5 pretreated as follows. The nitrocellulose filter was 
pre-wetted with water and soaked in UL9 binding buffer. 
The filter was then placed on 1 to 3 pieces of 3MM 
filter paper , which were also presoaked in UL9 binding 
buffer. All filters were chilled to 4°C prior to 

10 placement in the apparatus. 

Prior to the application of the assay sample to 
the wells of the dot-blot apparatus, the wells were 
filled with 375 /il of UL9 binding buffer. Typically, 
5-50 Ml of sample (usually 10-15 /il) were pipetted into 

15 the wells containing binding buffer and a vacuum 
applied to the system to pull the sample through the 
nitrocellulose. Unbound DNA passes through the 
nitrocellulose, protein-bound DNA sticks to the 
nitrocellulose. The filters were dried and exposed to 

20 X-ray film to generate autoradiographs . 

C. Quantitation of Data . 

The autoradiographs of the nitrocellulose filters 
were analyzed with a Molecular Dynamics (Sunnyvale, CA) 

25 scanning laser densitometer using an ImageQuant 
software package (Molecular Dynamics) . Using this 
software, a 96-well grid was placed on the image of the 
autoradiograph and the densitometer calculated the 
"volume" of each dot ("volume" is equivalent to the 

30 density of each pixel in the grid square multiplied by 
the area of the grid square) . The program automatical- 
ly subtracts background. The background was determined 
by either the background of . a line or object drawn 
outside the grid or by using the gridlines as back- 

35 ground for each individual dot. 
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The data is exported to a spreadsheet program , 
such as "EXCEL" (Microsoft Corporation, Redmond, WA) 
for further analysis. 

5 D. Analysis of Data . 

The data generate from the densitometry analysis 
was analyzed using the spreadsheet program "EXCEL." 

For each test oligonucleotide, at each drug 
concentration and/ or each time point, a raw % score was 
10 calculated. The raw % score (r%) can be described as 

r% = (T/C) x 100 

where T was the densitometry volume of the test sample 

15 and C was the densitometry volume of the control 
sample. The oligonucleotides were then ranked from 1 
to 256 based on their r% score. Further calculations 
were based on the rank of each oligonucleotide with 
respect to all other oligonucleotides. 

20 The rank of each oligonucleotide was averaged over 

several experiments (where one experiment is equivalent 
to testing all 256 test oligonucleotides by the assay 
of the present invention) in view of the variability in 
rank between any two experiments. The confidence level 

25 for the ranking of the oligonucleotides increased with 
repetition of the experiment. 

Figure 15 shows the results of 4 separate experi- 
ments with distamycin. The test samples were treated 
with 10 jiM distamycin for 30 minutes. The r% scores 

3 0 are shown for each of the 4 experiments (labeled 918A, 
918B, 1022A, and 1022B) and the ranks of each oligonu- 
cleotide in each experiment are shown* The test oligo- 
nucleotides have been ranked from 1 to 256 based on 
their average rank. The average rank was the sum of 
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the ranks in the individual experiments divided by the 
number of experiments* 

Figures 16 and 17 show the results presented in 
Figure 15 in graphic form. Figure 16 shows the average 
5 ranks plotted against the ideal ranks 1 to 256. Figure 
17 shows the average r% scores plotted against the rank 
of l to 2 56. These data demonstrate the reproducible 
ability of the assay to detect differential binding and 
effects of distamycin on different 4 bp sequences. 

10 

Example 11 

Determining a Consensus Binding Site for Distamycin 
One method used to determine the sequence prefer- 
ences for distamycin was to examine the sequences that 
15 rank highest in the assay for sequence similarities. 
This process may be accomplished visually or by 
designing computer programs to inspect the data. 

Using the data shown in Figure 15, consensus se- 
quences can be constructed for distamycin in the 
20 following manner. Sequences with rankings less than 50 
(indicating a strong effect of distamycin on the test 
sequence) in all four experiments were: 



TABLE VI 



Sequence 


Rank 


TTCC 


1 


TTAC 


2 


TACC 


3 


TATC 


4 


TTCG 


6 


ACGG 


8 



WO 94/14980 



PCT/US93/12388 



184 

Sequences with rankings less than 50 (indicating 
a strong effect of distamycin on the test sequence) in 
three of the four experiments were: 

5 TABLE VII 



20 



Sequence 


Rank 


AACG 


5 


TTTC 


7 


TTAG 


10 


TAAC 


12 


TACG 


15 


AGAC 


17 


AAAC 


18 


AGCG 


21 


AGCC 


22 


TTCT 


24 


ACGC 


25 


AGGG 


28 


1 AGGC 


30 


1 TTGC 


37 


ATCG 


39 


TTTG 


43 



Sequences with rankings less than 50 (indicating 
25 a strong effect of distamycin on the test sequence) in 
two of the four experiments were: 



TABLE VIII 



Sequence 


Rank 


TAGC 


9 


TTGG 


11 
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Sequence 


Rank 


AAAG 


13 


AACC 


14 


CAAC 


16 


ATCC 


19 


AAGG 


20 


TAAG 


23 


ACCC 


26 


TCCC 


29 


TATG 


31 


ACCG 


32 


TCGG 


34 


AGTC 


35 


CTCG 


38 


AATC 


44 


AGAG 


46 


TTAA 


47 


ACAC 


48 


AGTG 


49 


TCAC 


52 



20 

The following assumptions allow prediction of a 
consensus sequence for a distamycin recognition se- 
quence: (i) the most favored sequences are the test 
sequences that rank in the top 50 in all four experi- 
25 ments; (ii) the next favored sequences will be the test 
sequences that rank in the top 50 in 3 of 4 experi- 
ments; and (iii) the next favored sequences will be the 
test sequences that rank in the top 50 in 2 of 4 
experiments . 
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The positions in the test sequence are represented 

by the numerals 1, 2, 3 and 4. One consensus sequence 

that predicted from the above binding data is: 

12 3 4 

5 T T/A N C/G 

The nucleotides at each position can also be 

ranked: 

12 3 4 

T T>A C>A>T>G OG 

10 Furthermore, the importance of the position of the 

nucleotide can be ranked. Examination of this data 

would indicate that the importance of the positions is 

1 > 4 > 2 > 3. 

These data can be tested for validity by deriving 
15 all possible consensus sequences and examining their 
scores in the assay. The consensus sequences derived 
from the above information, in order of rank as 
predicted by the consensus sequence, are: 



20 TABLE IX 



Sequence 


:' Predicted Rank 


Actual Rank 


TTCC 


1 


1 


TACC 


2 


3 


TTCG 


3 


6 


TACG 


4 


15 


TTAC 


5 


2 


TAAC 


6 


12 


TTAG 


7 


10 


TAAG 


8 


23 


TTTC 


9 


7 


TATC 


10 


4 


TTTG 


11 


43 


TATG 


12 


31 
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Sequence 


Predicted Rank 


Actual Rank 


TTGC 


13 


37 


TAGC 


14 


9 


TTGG 


15 


11 


TAGG 


16 


58 




Average rank: 


17 



Note that the actual rank numbers are out of a 
possible 256 and that only one number is greater than 
50. The average rank of these 16 oligos is only 17. 
10 These data indicate that the consensus sequence has 
predictive value. 

Using the same data, a second consensus sequence 
can be derived that has slightly worse average rank 
with respect to the relative effect of distamycin in 
15 the assay* 



TABLE X 



1 x 


2 


3 


4 




A/G/C 


G/C/A 


G/C 


| A 


A>G=C 


OA=G 





20 

The test sequences predicted by this consensus se- 
quence are as follows: 



TABLE XI 



Sequence 


Actual rank 


AACG 


5 


AACC 


14 


AAAG 


13 


AAAC 


18 


AAGG 


20 
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Sequence 


Actual rank 


AAGC 


74 


hGCG 


21 


AGCC 


22 


AGAG 


46 


AGAC 


17 


AGGG 


28 


AGGC 


30 


ACCG 


32 


ACCC 


26 


ACAG 


73 


ACAC 


48 


ACGG 


8 


ACGC 


25 


Ave, 
rank: 


29 



This consensus sequence also appears to be 
predictive of favored distamycin binding sites since 
the average rank of test oligonucleotides predicted by 
20 this sequence is 29, substantially below the median 
rank of 128. However, the sequences predicted by this 
consensus sequence do not appear to be affected as 
strongly by distamycin as the sequences in the first 
consensus sequence, described above. 



25 
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Example 12 

Testing Actinomycin D to Determine Sequence 
Specificity and Relative Binding Affinity 

A. Ranking of Actinomycin D Sequence Binding 
5 Affinities . 

Actinomycin D has been tested for sequence 

selectivity and relative binding affinity to the 256 

different 4 bp sequences. The assay was performed 

essentially as described in Example 10. One assay 

10 mixture useful for the testing of actinomycin D 
contained 1.5 nM radiolabeled DNA and 12.8 nM UL9-C00H 
protein prepared as described above in the UL9 binding 
buffer (20 mM Hepes, pH 7.2, 50 mM KCl, and 1 mM 
dithiothreitol) . The concentration of the components 

15 can be varied as described in the Detailed Description. 

The assay mixtures containing both UL9 and DNA 
were incubated at room temperature for at least 10 
minutes to allow the DNA: protein complexes to form and 
for the system to come to equilibrium. At time - 0, 

20 the assay was begun by adding water (control samples) 
or actinomycin D (25 /xM, test samples) to the assay 
mixtures using a 12-channel micropipettor . After 
incubation with drug for 30 minutes, samples were taken 
and applied to nitrocellulose filters using a 96-well 

25 dot blot apparatus (Schleicher and Schuell) held at 
4°C. Figure 18 shows the results of 8 screens of 
actinomycin D. 

The % reduction in DNA: protein complex as a result 
of the presence of actinomycin D is called M r%"; the 

30 lower the r% score, the more effective the test mole- 
cule in blocking the DNA: protein interaction. For each 
screen, the test oligonucleotides have been ranked from 
1 to 256, based on the r% score; the rank of 1 denotes 
the lowest r% score (the test oligonucleotide most 

35 effected by the test molecule) , the rank of 256 denotes 
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the highest r% score (the test oligonucleotide least 
effected by the test molecule) . The table also shows 
the average r% score and average rank of each test oli- 
gonucleotide; the averages are calculated from the sum 
5 of the individual scores and ranks divided by the 
number of screens, respectively. The test oligonucleo- 
tides are then ranked from 1 to 256 based on the 
average rank in all screens. The final ranking is 
shown in the two external columns on the table. Test 

10 oligonucleotides ranking less than 50 in any individual 
screen are shown in highlighted boxes. 

Figure 19 shows the final rank of test oligonucle- 
otides screened with actinomycin D plotted against the 
average r% score for these test oligonucleotides. 

15 Figure 20 shows the final ranking vs. the ranks in 

each individual experiment, the average rank, and the 
ideal rank. 

B. Analysis of the Data Obtained from Ranking 
20 Actinomycin D Sequence Binding Affinities . 

Several simple analytical procedures may be 

applied to the data from the screens. 

1. Position Effects . 

25 First, to examine possible preferences of the 

test molecule for a base at any particular position in 
the test site, the average r% scores are examined. The 
average r% scores for each of the 64 possible test oli- 
gonucleotides at each position in the test site are 

30 averaged. For example, to determine the effect of 
having an A in the first position of the test site, the 
"A," position, the average r% scores for the 64 test 
oligonucleotides with A in the first position are 
averaged. The results of this analysis are shown in 

35 Figure 21. The mean score for all oligonucleotides in 
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these screens was r% value 67; the standard deviation 
was 11.8. 

If the r% score is expressed as variance from the 
mean, as shown in Figure 21, one observes that none of 
5 the scores is markedly deviant from the mean. These 
results suggest that a single base in any particular 
position has little impact on the binding of the 
actinomycin D to the test site. 

10 2. Dinucleotide Analysis . 

The results of the actinomycin D screen were 
examined for the presence of dinucleotide pairs that 
scored well or poorly in the rankings. High scores 
indicate a preference for the test sequence. Low 

15 scores indicate a repulsion of actinomycin D for the 
test sequence. A dinucleotide analysis is one of many 
simple analytical procedures that may be applied to the 
data to extract meaningful impressions about the nature 
of the sequences to which the test molecule has high 

20 affinity. 

The data are examined in a manner similar to that 
used for the single nucleotide analysis. The 16 
possible average r% scores for any particular dinucleo- 
tide combination are examined. Specific adjacent 

25 dinucleotides (NjN 2 , N 2 N 3 , N 3 N 4 ) or adjacent dinucleotide 
pairs at any particular position (N X N X+1 = the average 
of NjN 2 , N 2 N 3 , and N 3 N 4 ) may be examined, as well as 
specific dinucleotide pairs that are not adjacent 
(N,N 3/ N 2 N 3 , N,N 4 ) and any dinucleotide pair separated by 

30 one base (N x N x+2 = the average of N,N 3 and N 2 N 3 ) . The 
means for each set are determined as well as standard 
deviations. 

The difference from the mean (i.e., the mean score 
less the average r% score for any particular dinucleo- 
35 tide) reflects the extent of deviation from the norm. 
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Differences from the mean greater than 2-3 standard 
deviations from the mean are considered to be signifi- 
cant. The data for the dinucleotide analysis of 
actinomycin D is shown in Figure 22. The differences 
5 from the mean are displayed graphically in Figure 23. 

In reference to Figures 22 and 23, the dinucleo- 
tide preference of actinomycin D is GC, particularly in 
the NjN 2 position, but also at any (N X N X+J ) adjacent 
dinucleotide sequence in the test site. 

10 If the data are combined in a combined bar chart, 

shown in Figure 24, where the cumulative results for 
any dinucleotide pair are tabulated in a single bar, 
the overall observation can be made that actinomycin D 
prefers GC-rich sequences over AT-rich sequences, with 

15 a particular preference for the dinucleotide pairs 
involving GC. 

Example 13 

A Method for Selecting Target Sites for DNA-Bindincr 

2 0 Molecules that are Pinters or Trimers of Distamvcin 

Once the relative binding preferences of a 
distamycin have been determined, sequences are selected 
for target sites for DNA-binding molecules composed of 
two distamycin molecules, bis-distamycins, or three 
25 distamycin molecules, tris-distamycins. 

A. Selecting Seouences for Binding with Highest 
Affinity to Distamvcin Oligomers . 

The top binding sites for distamycin, determined 

3 0 as described above, are defined by the consensus se- 

quence, 5'-T:T/A:C/A:C-3' : accordingly, the top se- 
quences are TTCC, TTAC, TACC and TAAC. Using this 
information, 2 4 = 16 possible dimer sequences, i.e., 
combinations of the four top binding sequences, can be 
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targeted by a bis-distamycin in which the distamycin 
molecules are immediately adjacent to one another. 

The top strands of the 16 possible duplex DNA 
target sites for binding bis-distamycins are shown in 
5 Figure 25. Similarly, trimers of distamycin, tris- 
distamycins, could be targeted toward selected 12 bp 
sequences, comprised of all possible combinations of 
the four 4 bp sequences. There are 3 4 = 81 possible 
highest affinity target trimer sequences. 
10 There are several advantages to targeting longer se- 
quences with bis- or tris-distamycin: 

B. As the Number of Potential Target Sites 
Decreases, Specificity Increases . 

15 All 8 bp combinatorial possibilities of the 4 top 

favored binding sites for distamycin are potential high 
affinity binding sites for bis-distamycin. The 
consensus sequence used in this example predicts four 
favored binding sites for distamycin. This represents 

20 (4/4 4 )*100 = about 1.6% of the possible 4 bp sites in 
the genome. Since there are 4 s possible 8 bp sequenc- 
es, this represents, on average, only (2 4 /4 8 )*100 = 
about 0.02% of the total genome. There are 4 12 possi- 
ble 12 bp sequences, this represents, on average, only 

25 (3 4 /4 12 )*100 = 0.00000075% of the genome. 

The following discussion provides perspective and 
illustrates the improvement in the actual number of 
target sites in the human genome for when using a dimer 
of distamycin versus a monomer of distamycin. The 

3 0 human genome is about 3 x io 9 bp. If the number of 
favored target sites for distamycin is four, and the 
number of possible 4 bp sequences is 4 4 =256, then the 
number of favored target sites in the genome is 
(4/256) (3 x 10 9 ) = 4.7 x io 7 , or about 50 million 

35 favored target sites. 
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Given that the number of possible 8 bp sites is 4 8 
= 65,536, if all possible combinatorial 8 bp sites 
derived from the favored 4 bp sites (2 4 = 16; Figure 
25) are favored, then the number of favored 8 bp target 
5 sites is (16/65, 536) (3 x 10 9 ) = 7,3 x 10 5 or about 
7 00,000 possible sites. This represents a 64-fold 
reduction in the number of highest affinity target 
sites between distamycin and bis-distamycin; alterna- 
tively, this result can be viewed as a 64-fold increase 

10 in specificity. 

Likewise, given that the number of possible 12 bp 
sites is 4 12 = 1.7 x 10 7 , if all possible favored 12 bp 
sites (3 4 = 81) are favored, then the number of favored 
12 bp target sites is (81/1.7 x 10 7 ) (3 x 10 9 ) « 1.4 x 

15 10 4 : i.e., 14,000 possible highest affinity sites. 
This represents an approximately 3000-fold decrease in 
the number of highest affinity target sites between 
distamycin and tris-distamycin and c 500-fold decrease 
in the number of highest affinity target sites between 

20 bis-distamycin and tris-distamycin. 

C. An Exponential Increase in Affinity . 
As the target site increases in size, (i) the 
number of target sites in a defined number of nucleo- 
25 tides decreases, and (ii) the specificity increases. 
Further, the affinity of binding is typically the 
product of the binding affinities of component parts 
(see Section VI.E.l above). As an example, the 
published binding constant for distamycin to bulk 
3 0 genomic DNA is about 2 x io s M* 1 . Dimers of distamycin 
will have a theoretical binding affinity of the square 
of the binding constant of distamycin: 

(Kdisu^e - 2 x 10 5 M* J ; K bi$ ^ = (2 x 10 W - 4 x 
10 10 M* 1 ) . Trimers of distamycin will have binding 
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affinities of the cube of the binding affinity of 
distamycin: 

(Kuwi«. = (2 x loV) 3 = 8 x io 15 K l ). 

5 

Thus, if distamycin shows only a 10-fold higher 
affinity (2 x io 6 !*" 1 ) for the top favored binding sites 
than the average binding sites in DNA, then the 
affinity constant for bis-distamycin to an 8 bp site 
10 comprised of two favored binding sites is 100-fold 
higher than for an 8 bp sequence comprised of two 
average binding sites: 

f^bis-dista. favored site* / ^bb-dtstft.avcmge sites = X 10 6 ) 2 /(2 X 10 5 ) 2 

15 - 100) . While this does not represent absolute se- 
quence specificity in binding, the binding affinity is 
100-fold greater for 0.02% (16/65,536) of the total 
possible 8 bp target sequences. 

The use of a trimer targeted sequence will afford 

20 an even higher increase in affinity to the most favored 
binding sites: 

^iris-dista. frvared sites / ^uts-disu average sites — (2 X 10 6 ) 3 /(2 X IO 5 ) 3 

= 1000. Thus, with only 10-fold differential activity 
in binding between favored sites and average sites, a 

25 1000-fold difference in affinity can be achieved by 
designing trimer molecules to specific target sites. 
When considering the administration of DNA-binding 
molecules as drugs, a 1000-fold lower dose of tris- 
distamycin, versus the distamycin monomer, could be 

3 0 administered and an increase in relatively specific 
binding to selected target sites achieved. 

In this example, the differential activity of 
distamycin is only 10-fold. Clearly, differential 
activities of larger magnitudes will greatly accentuate 

35 the increased affinity effect. For example, a 100-fold 
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difference in activity of a 4 bp DNA-binding molecule 
toward high affinity and average affinity sequences 
would result in (i) a 10,000-fold difference in the 
binding affinity of a dimer of the molecule targeted to 
5 an 8 bp sequence, and (ii) a million-fold increase in 
the binding affinity of the trimer to a 12 bp sequence* 

D. Selecting Target Sequences for Distamycin 
Oligomers with Flexible and/or Variable- 
10 Length Linkers in Between the Distamycin 

Moieties . 

The sequences that can be targeted with bis- or 
tris-distamycin molecules are not limited to sequences 
in which the two 4 bp favored binding sites are 

15 immediately adjacent to one another. Flexible linkers 
can be placed between the distamycin moieties and se- 
quences can be targeted that are not immediately 
adjacent* The target sequences can have distances of 
1 to several bases between them: this distance depends 

20 on the length of the chemical linker. Examples of bis- 
distajnycin target sequences for bis-distamycins with 
internal flexible and/or variable length linkers 
targeted to sites comprised of two TTCC sequences are 
shown in Figure 26, where N is any base. 

25 For each particular bis-distamycin, the explana- 

tions of increased affinity and specificity remain the 
same as described above with the following exception. 
For the case in which the linker was sufficiently 
flexible to span different numbers of bases in between 

30 the two distamycin sites, the number of sites targeted 
with highest affinity would be multiplied by the number 
of bases spanned. 

In respect to the ease of drug design and target 
selection, there are several advantages to the above 

35 described targeting strategies, including the follow- 
ing: 
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i) Any conformational changes induced by binding 
at the half -site would be minimized. 

ii) The affinity, therefore, would be more likely 
to be the product of the affinities of the interactions 

5 observed for the monomer ic sites. 

iii) The half -molecule (e.g, i distamycin unit) 
would anchor the bis-molecule (e.g., bis-distamycin) 
thus increasing the localized concentration for the 
binding of the second half of the bis-molecule. 

10 iv) If a simple linking chain is used, with a 

variable number of atoms, the number of sites that can 
be targeted by multimers of the monomer increases. 
This targeting method can be of value when, for 
example, there are no medically significant target 
sites with adjacent favored binding sites for dista- 
mycin. Therefore there are no good target sites for 
bis-distamycin. in this situation, the database can be 
screened for additional target sequences with N, to 
(where N is any base) between the two target binding 
20 sequences. For example, where n=4, the number of se- 
quences to be searched becomes (4 2 )*4 = 64. The 
likelihood of finding such a sequence is reasonably 
high. 



15 



25 E * Selecting a Speci fic Tar-got Sits . 

Using the above approach, a sequence was identi- 
fied from the medically significant target site 
database that contains SEQ ID NO: 619 , which is a subset 
of the group of sequences represented by SEQ ID NO: 620 

30 seq id no: 619 occurs overlapping the binding site for 
a transcription factor, Nuclear Factor of Activated T 
Cells (NFAT-l) , which is a major regulatory factor in 
the induction of interleukin 2 expression early in the 
T cell activation response. NFAT-l is crucial in (i) 
35 the T cell response, and (ii) in blocking the expres- 



WO 94/14980 



PCT/US93/12388 



198 

sion of IL-2, which causes immunosuppression. The se- 
quences TTCC and TTTC, the distamycin target binding 
sequences in SEQ ID NO: 619, rank first and seventh in 
the assay. 

5 

Example 14 

The Use of the Assay in Competition Studies 
The assay of the present invention measures the 
effect of the binding of a DNA-binding molecule to a 
10 test site by the release of a protein from an adjacent 
screening site. Accordingly, the assay is an indirect 
assay. Following here is the description of an 
application of the assay useful to provide confirmatory 
evidence of the data obtained in the initial screening 
15 processes. 

The results of the distamycin screening assay 
described in Example 10 suggested that there were 
possible false negatives: specifically, test sequences 
that bind distamycin but fail to show an effect on the 
20 binding of the reporter protein. The data suggesting 
false negatives was as fellows. If the assay detected 
strictly the affinity of binding of distamycin, then 
the scores of the test sequences complementary to the 
high-scoring test sequences should always be equally 
25 high. However, an examination of the highest ranking 
test sequences and the complementary test sequences 
reveals that this is not the case (see Table XII) . 



TABXjE XII 



30 



Rank 


Test 
Sequence 


Complement 


Rank of 
complement 


1 


TTCC 


GGAA 


42 


2 


TTAC 


GTAA 


244 


3 


TACC 


GGTA 


185 
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Rank 


Test 
Sequence 


Complement 


Rank of 
complement 


4 


TATC 


GATA 


213 


5 


AACG 


CGTT 


144 


6 


TTCG 


CGAA 


216 


7 


TTTC 


GAAA 


235 



5 

All but one of the complementary sequences rank in 
the lower half, 4 of them in the lowest 20% f i.e., 
these was little effect on reporter protein binding in 
the presence of distamycin when using these sequences 

10 as test sequences in the assay. 

This observation reflects the usefulness of a 
confirmatory assay that examines the relative affinity 
of a particular sequence for binding distamycin. A 
confirmatory assay may also be useful in revealing 

15 additional information about the physical characteris- 
tics of drug binding. For example, one can hypothesize 
that the reason for the apparent inverse relationship 
between test sequences with high activity in the assay 
and their complements is that the effect of distamycin 

20 is directional and only active at one test site. This 
hypothesis can be tested using the following competi- 
tion experiment* Competitor oligonucleotides, contain- 
ing test sequences of interest, are added to the assay 
mixture. This allows the determination of which test 

25 sequences compete most effectively with the radiolabe- 
led test oligonucleotide for binding distamycin. 

Assay mixtures are prepared as described in 
Example 10, using a high-ranking test oligonucleotide, 
e.gr., TTCC (ranking = #1), as the radiolabelled oligo- 

30 nucleotide in the experiment. The test oligonucleotide 
TTCC is labelled to high specific activity with 7~ 32 p- 
ATP as described in Example 8: in this example, the 
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labeled TTCC oligonucleotide will be referred to as the 
"high specific activity test oligonucleotide". 

The competitor oligonucleotides are labeled as 
described in Example 8, except that the ATP used for 
5 kinasing the primer is 1:200 radiolabeled rnonradiola- 
beled. In other words, the competitor oligonucleotides 
are tracer labeled with radioactive phosphorous to a 
200-fold lower specific activity than the high specific 
activity test oligonucleotide. Since all of the 
10 competitor oligonucleotides are labeled with the same 
radiolabeled primer molecule, the relative concentra- 
tions of the competitor DNAs can be determined with 
high accuracy. Further, since the specific activity is 
the same, the concentrations can be adjusted to be the 
15 same. For the purposes of this example, the competitor 
DNAs are referred to as "low specific activity competi- 
tor oligonucleotides." 

The use of competitor DNAs for which the concen- 
tration is known is important for the competition 
20 experiment. The accuracy of the competition assay may 
be further enhanced by separating any unincorporated 
radiolabeled primer from the double stranded competitor 
oligonucleotides. This separation can be achieved 
using, for example, a 6-20% polyacrylamide gel* The 
15 gel is then exposed to x-ray film and the amount of 
double-stranded oligonucleotide determined by use of a 
scanning laser densitometer, essentially as described 
in the Examples above. 

The competition assay is performed as described in 
0 Example 10, except that competitor DNAs are added in 
increasing relative concentration to the high specific 
activity test oligonucleotide. The DNA concentration 
( [ DNA] ) is held constant and the UL9 concentration 
( [ UL9 ] ) and distamycin concentration ( [distamycin] ) are 
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as described in Example 10. The components in the 
competition assay samples are as follows* 
Controls: 

UL9 -f- TTCC*; UL9 + TTCC* + Competitors; UL9 
5 + TTCC* + distamycin; 

Test samples: 

UL9 + TTCC* + distamycin + Competitors; 
where UL9 is UL9-C00H, TTCC* is the high specific 
activity test oligonucleotide, and Competitors are the 

10 low specific activity competitor oligonucleotides, 

TTCC-low (the tracer-labeled low specific activity 
competitor) competes with TTCC* on an equimolar basis 
for the binding of both protein and distamycin. A com- 
petitor molecule with lower affinity for distamycin 

15 than TTCC requires a higher molar ratio to TTCC* to 
compete for distamycin binding. The competition for 
protein between all competitors is constant. Only the 
competition for distamycin varies; the variability is 
due to the differential affinity of the competitor oli- 

20 gonucleotides for distamycin. The concentration of 
competitor used in these experiments varies over a 
range of concentrations and is determined empirically 
by (a) the test molecule concentration, and (b) the 
relative affinity of the competitor and the radio- 

25 labeled test oligonucleotide. Typically, the competi- 
tor DNA consists of only the test sequence, that is, no 
additional sequences are connected to the test se- 
quence . 

The competition assay described here facilitates 
3 0 the determination of actual rank between the test oli- 
gonucleotides that are detected as highly effective 
molecules in the original assay. The competition assay 
also facilitates the detection of false negatives. As 
described above, the results of the assay discussed in 
35 Example 10 imply "directional" binding of distamycin, 
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in which the effect of binding is only detected when 
the molecule is bound in one direction with respect to 
the UL9 protein- Binding in the opposite direction 
(i.e., to the complementary test sequence) is not 
5 detected with the same activity in the assay. 

The purpose of this competition experiment is to 
use the test oligonucleotides to compete for the 
binding of distamycin. If the sequences complementary 
to the "best binders" are false negatives in the assay, 
10 they should nonetheless be effective competitors in the 
competition assay. 

Example 15 

A Method of Selecting Target Sequences 
15 From Database Sequence Information 

The binding of a drug or other DNA-binding mole- 
cule to the recognition sequence for TFIID, or other 
selected transcription factors, is expected to alter 
the transcriptional activity of the associated gene, 

20 TATA-boxes, which are the recognition sequences for the 
transcriptional regulatory factor TFIID, are associated 
with most eukaryotic promoters and are critical for the 
expression of most eukaryotic genes. Targeting a DNA- 
binding drug to TATA boxes in general would be undesir- 

25 able. However, sequences flanking TATA box sequences 
are typically unique between genes ♦ By targeting such 
flanking sequences, perhaps with one base overlapping 
the TFIID recognition site, each gene can be targeted 
with specificity using the novel DNA-binding molecules 

30 designed from the data generated from the DNA-binding 
drug assay. One method for determining novel and 
specific target sequences for novel DNA-binding drugs 
is described here. The method may be applied to any 
known binding site for any specific transcription 
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factor, regardless of whether the identity of the 
transcription factor itself is known. 

TATA-boxes have been determined for a large number 
of genes. Typically, the TATA-box consensus sequence 
5 has been identified by examining the DNA sequence 5' of 
the RNA start site of a selected gene. However, the 
most rigorous determinations of TATA boxes have also 
demonstrated the transcription factor binding site by 
DNA protection experiments and DNA: protein binding 

10 assays (using electrophoretic methods) . Many of these 
sites are annotated in the public databases "EMBL" and 
"GENBANK", which both contain sequences of nucleic 
acids sequences. Unfortunately, the flat field listing 
of these databases do not consistently annotate these 

15 sites. It is possible, however g to automatically 
search a database, using a text parsing language called 
AWK, to extract most sequence information that relates 
to annotated promoter sequences* 

The following is a description of how selected 

20 promoter sites were located in the public database from 
"EMBL." The flat field annotations from "EMBL" Version 
32 as processed by " INTELLI GENTI CS " (Mountain View, 
CA) , were obtained with the set of UNIX programs call 
"IG-SUITE. " These programs were executed on a "SUN 

25 IPX" workstation. An AWK script was used to parse all 
the primate annotation files listed in the "EMBL" 
database. The AWK interpreter is supplied as part of 
the system software that comes with the "SUN IPX" 
workstation. 

30 The following is a description of how the AWK 

parses annotation files looking for and printing 
information relating to promoters and TATA-boxes. The 
system is asked to examine the input files for certain 
key words in the header lines or annotations to the se- 

35 quence. The AWK interpreter reads input files line by 
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line and executes functions based on patterns found in 
each line. In this case, the AWK system read the 
annotation files of EMBL. The following is a descrip- 
tion of how the AWK script can be used to parse out se- 
5 quences containing TATA-boxes. 

The program first examines the files for all 
header lines containing the word "complete" but not 
"mRN£" or "pseudogene" ; the output is printed. 
Complete genes sometimes contain the promoter sequences 

10 but complete mRNA genes do not contain the promoters, 
mRNA genes are not of interest for the purpose of 
detecting promoter elements. Next, the AWK system 
looks for the word "exon 1" and if it finds it prints 
the header and "DE" line. Then it looks for "5'" and 

15 prints the header line if it does net contain the word 
"mRNA". Next it looks for the word "transcription" and 
if it finds it prints the preceding and following line 
along with description line. 

Next, the AWK system examines the files for the 

20 word "TATA" in the header lines or references. This 
results is printed. After this it looks for the word 
"promoter" and if it finds it prints that line and the 
line after it which contains the information about the 
promoter. Then the program looks for "protein_bind" 

25 and prints that line along with the next one. The 
description of "protein_ bind" is usually used to mark 
potential binding sites of transcription factors in the 
"EMBL" database. AWK then scans for any annotated 
primary mRNA start sites. The promoter sequence is 

30 found in front of the start site. Finally, any exon 1 
start sites that are annotated in the feature table are 
extracted. Exon 1 start sites should also be the 
primary transcription start site and the TATA boxes 
usually are found approximately 25-35 base pairs 5' to 

35 the transcriptional start site. 
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The actual AWK script is included here as an 
example of how to parse a database to extract promoter 
sites: 

BEGIN {print_next_line=0} 
5 {if (print_next_line==i) 
{print $0 

print_next_line=0} 

} 

{if ($o -/*>/) 
10 { Locus=$0 

l_flag=0 } 

} 

r>l && / [Ccjomplete/ && $ 0 !~ /mRNA'mrna/ && $ 0 
! -/pseudogene/ {print} 
/*>/ && /exon i[-o-9]/ {print} 
/*>/ && /5'/ && $o i- /mRNAlmrna/ {print} 
/[Ttjranscription/ {print Locus "\n" pl »\n" 
$ 0 ; pr int_next_l ine=l } 
{if ($0 -/-FT/ && $0 -/TATA/ && $0 -/note/) 
{print Locus "\n M PL"\n"$0} 

} 

{if {$0 -/-FT/ && $o ~/[Tt]ranscription/ && $o -/\//) 
{print Locus "\n H PL M \n"$0} 

> 

{if ($2 !- /note/ && $2 - /TATA/) {print Locus »\ n « $0} 

{if ($2 -/promoter/) 

{print_next_line=i 
if (l_flag=0) 

{print Locus "\n" $0 
l_f lag=l} 
else 

print $0 

} 

35 } 



20 



25 



30 
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{if ($2 ~/protein_bind/) 

{print Locus "\n" $0 
print_next_line=l} 

} 

5 {if ($2 ~/prim_transcript/ && $3 I . . | *<i. . /) 
{print Locus "\n" $0 
print_next_line=l} 

} 

{if ($0 -/-FT/ && $0 -/ntirober=l["0-9]/) 
10 if (PL ~/exon/) {print Locus »\n» PL M \n"$0} 

} 

{PL=$0} 

After the AWK script is run on the database the 

15 output is manually examined. Those sites that are 
clearly promoter sites are noted and nucleotide 
coordinates recorded. Other gene sequences are 
examined using the "FINDSEQ" program of "IG_SUITE" to 
see if the promoter sites can be determined or if the 

20 references in the database describe the promoter se- 
quences. If so, those nucleotide coordinates are 
noted. At the end of this examination "FINDSEQ" is 
used to extract any sequences containing promoter se- 
quences by using an indirect file of "LOCUS" names 

25 constructed using a text editor. 

A parsing program was also written to extract each 
of the annotated sites from the file that "FINDSEQ" 
extracted from "EMBL." This program extracts the 
following information: the promoter site name and four 

30 numbers representing the nucleotide coordinates of 
where the sequence is to start, what the coordinate of 
the first base of the site is, the coordinate of the 
last base of the site and the end of the sequence to be 
extracted. A large batch file was constructed to 
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automatically extract each of the promoter sites. 
These sequences formed the basis of Table V. 

The Sequence Listing presents a number of se- 
quences that are useful as test sequences in the 
5 present invention, SEQ ID NO:l to SEQ ID NO: 481 and 
SEQ ID NO: 600 correspond to promoter targets (typical- 
ly, TATA box-containing sites) for human genes. SEQ ID 
NO: 482 to SEQ ID NO: 599 correspond to promoter targets 
for viral genes. 

10 

Example 16 

Using Normalized Values to Determine 
Sequence Specificity and Relative Binding Affinity 

A. The Assay Mixture and Calibrator Samples . 

15 The assay mixture is prepared as described in 

Example 10. The concentration of the components can be 
varied as described in the Detailed Description. 

The assay mixtures containing both UL9 and DNA are 
incubated at room temperature for at least 10 minutes 

20 to allow the DNA:protein complexes to form and for the 
system to come to equilibrium. At time - 0, the assay 
is begun by adding water (control samples) or test 
molecule (typically at 1-5 fM, test samples) to the 
assay mixtures using a 12-channel micropipettor. After 

25 incubation with drug for 5-120 minutes, samples are 
taken and applied to nitrocellulose filters using a 96- 
well dot blot apparatus (Schleicher and Schuell) held 
at 4°C. 

Calibrator samples are used to normalize the 
30 results between plates, that is, to take plate-to-plate 
variability into account. Calibrator samples are 
prepared using 2-fold serial dilutions of DNA in the 
assay mixture and incubating duplicate samples in one 
column of the 96-well assay plate. The highest 
35 concentration of DNA used is the same concentration 
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used in the screening samples. In general, calibrator 
samples are used in all experiments* However, use of 
calibrator samples appears to be less important for 
experiments using blocked plates since the variability 
5 between blocked plates is lower than between unblocked 
plates. 

The calibrator samples are used to normalize the 
values between plates as follows. The volume values 
(Example 10) for the calibrator samples are obtained 

10 from densitometry. Volume values are plotted against 
DNA concentration. The plots are examined to ensure 
linearity. The volume values for the points on the 
calibrator line are then averaged for each plate. A 
factor, designated the normalization factor, is then 

15 determined for each calibrator line. When the normal- 
ization factor is multiplied by the average of the 
points on each calibrator line, the product is the same 
number for all plates. Usually, the average of the 
line averages is used for determining the normalization 

20 factor, although in theory, any of the line average 
numbers can be used. The operating assumption in this 
analysis is that the differences in the calibrator 
samples reflected the differences in adsorption for 
each plate. By normalizing to the calibrator samples, 

25 these variations are minimized. 

Once the normalizing factor is obtained, all of 
the raw volume values for each of the test assays on 
the plate is multiplied by the normalizing factor. For 
example, if the following data were obtained, the 

3 0 process of normalization would be as follows: 
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TABLE XIII 



PLATE 
NUMBER 


DNA CONCENTRATION 


0.8 • 


0.4 


I 6.2 


■•-0vi;, : : : 


iAveragekl 


Plate I: 


4000 


2000 


1000 


500 


1875 


Plate II: 


4200 


2100 


1050 


525 


1969 


Plate III: 


3800 


1900 


950 


475 


1781 




Average: 1875 



Plate I has a normalization factor of l; Plate II 
10 has a normalization factor of 1875/1969 = 0.95; Plate 
III has a normalization factor of 1875/1781 - 1.05. 
The equation used to establish these numbers is as 
follows: "Average average "/line average - normaliza- 
tion factor. 

15 If the normalization factors are different, these 

factors are incorporated into the data analysis. The 
sample data on each plate is then multiplied by the 
normalization factor to obtain normalized volume 
values. 

20 

B. The Capture/Detection System . 

A 96-well dot blot apparatus is typically used to 
capture the DNA: protein complexes on a nitrocellulose 
filter as described in Example 10. 

25 

C. Quantitation of Data . 

The autoradiographs of the nitrocellulose filters 
are analyzed as described in Example 10. 

30 D. Analysis of Data . 

After densitometry, the data is analyzed using a 
spreadsheet program, such as "EXCEL." For each plate, 
the calibrator samples are examined and used to 
determine the normalization value. Then, for each test 
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oligonucleotide, at each drug concentration and/ or each 
time point, a normalized % score is calculated. The 
normalized % score (n%) can be described as follows: 
n% = (nT/nC) x 100, 
5 where (i) nT is the densitometry volume of the test 
sample multiplied by the normalization factor for the 
plate from which the sample was obtained, and (ii) nC 
is the densitometry volume of the control sample 
multiplied by the normalization factor for the plate 
10 from which the sample was obtained. The oligonucleo- 
tides are then ranked from 1 to 256 based on their n% 
scores. 

While the invention has been described with 
15 reference to specific methods and embodiments, it will 
be appreciated that various modifications and changes 
may be made without departing from the invention. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Genelabs Technologies, Inc. 

(B) STREET: 505 Penobscot Drive 

(C) CITY: Redwood City 

(D) STATE: CA 

(E) COUNTRY: USA 

(P) POSTAL CODE: 94063 

(ii) TITLE OF INVENTION: Sequence-Directed DNA Binding 
Molecules, Compositions and Methods 

(iii) NUMBER OF SEQUENCES; 641 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Genelabs Technologies , Inc. 

(B) STREET: 505 Penobscot Drive 

(C) CITY: Redwood City 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP: 94063 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC— DOS /MS-DOS 

(D) SOFTWARE: Patent In Release #1*0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/123,936 

(B) FILING DATE: 17-SEP-1993 

(vii) PRIOR APPLICATION DATA: 
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(A) APPLICATION NUMBER: US 07/996,783 

(B) FILING DATE: 23-DEC-1992 

(viii) ATTORNEY / AGENT INFORMATION: 

(A) NAME: Fabian, Gary R. 

(B) REGISTRATION NUMBER: 33,875 

(C) REFERENCE/DOCKET NUMBER: 4600-0175. 41/G19PCT2 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 324-0880 

(B) TELEFAX: (415) 324-0960 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

fii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ferredoxin gene 

<xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 
GCTCTGCTTG CCAATGTCTT TATAGGTCAC CCGGAAGGCA CG 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

{iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human macrophage alphal-antitrypsin 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
CCTACTGCCT CCACCCGAAG TCTACTTCCT GGGTGGGCAG GAAC 44 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL : NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene B for alpha 1-acid 
glycoprotein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
AGTGACCGCC CATAGTTTAT TATAAAGGTG ACTGCACCCT GCAGCC 46 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for alpha 1 
microtubulin-bikunin 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
ATTGGAGCTG TCCTTGGGGC TGTAATTGGC CCCAGCTGAG CAGGGCA kl 
(2 ) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for alpha- 2 macroglobulin 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CTGTTTGCAC ACAGAGCAGC ATAAAGCCCA GTTGCTTTGG GAAGT 45 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ACAA gene for peroxisomal 
3-oxoacyl-CoA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CTCGGGTTTG GCTACAAAAG GTGGAAAGAC TTCCGGTCTG CATTTCTG 48 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human- ACAA gene for peroxisomal 
3-oxoacyl-CoA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
CAAGGTAGGC GGGGCATTGA GTGGAAAGCT CGGCTGGGCG GTGCCTGT 48 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human choline acetyltransf erase gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
GCAATTGTGA CCCACAGCCT AATAATAACA GTCTTTGCCC TCTTGGCC 48 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYrZi DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human angiotensin I-converting 
enzyme gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GGCGGGGGTG TGTCGGGTTT TATAACCCGC AGGGCGGCCG CGGCG 45 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene fragment for the 
acetylcholine receptor gamma 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GGGGTGGGA6 TGTAGGCTGT TATATGACAC CCAGAGCCCA TCTCT 45 
(2) INFORMATION FOR SEQ ID NO:il: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytokine (Act-2) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GTCCTAGGCC TCAGAGTCCC TATAAAGAGA GATTCCCAAC TCAGTA 46 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human beta-act in gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GGTGAGTGAG CGGCGCGGGG CCAATCGCGT GCGCCGTTCC GAAAG 45 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human beta-actin gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GAGCGGCCGC GGCGGCGCCC TATAAAACCC AGCGGCGCGA CGCGCCA 47 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cardiac actin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
TGCTCCAACT GACCCTGTCC ATCAGCGTTC TATAAAGCGG CCCTCCTGGA 50 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for vascular smooth 
muscle alpha-actin 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GAGGAGAGCA GGCCAAGGGC TATATAACCC TTCAGCTTTC AGCTTCC 47 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human enteric smooth muscle 
gamma- act in gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

hit 

AAGATCCGCC TCTGGGGTTT TATATTGCTC TGGTATTCAT GCCA w 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 
<C) STRAND ED NESS : double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human arachidonate 12-lipoxygenase 

gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GCGGGGCCGC AGACCGGTCC TTTAAAGGTT GGAAGTGGCC CCGAGG 46 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alcohal dehydrogenase alpha 
subunit (ADH1) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
GGTGTTATTC AAGCAAAAAA AATAAATAAA TACCTATGCA ATACACCT 48 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alcohal dehydrogenase beta 
subunit gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GATGTTACAC AAGCAAACAA AATAAATATC TGTGCAATAT ATCTGCTT 48 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS s double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alpha-f etoprotein gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
TAACAGGCAT TGCCTGAAAA GAGTATAAAA GAATTTCAGC ATGATTTTCC 50 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytosoli c adenylate kinase 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
ATGCCGCGCG CTGACAGCCT TATAAATAGT CGCCTTTGCC GGCCGCC 47 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) • 
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(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human 

alpha-N-acetylgalactosaminidase (AK1) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CGGACTTATC AGGTTACCGG ATTCGAGTCA GAAGCGGCGG CAGGTCTGAA 50 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ALAD gene for porphobilinogen 
synthase 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
ATAAAGACCT TTGATCGGAT CTATCATTGT ACCTATCATA GGTCTG 46 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE; 

(C) INDIVIDUAL ISOLATE: Human ALAD gene for porphobilinogen 
synthase 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CCCTACCAAG GAGGAAGACT GGATAAAATG GCCTGAGATG GCTGAA 46 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 58 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human albumin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
GAGAGTGACA AAGGCCTGAA TTTGTCAATT AGTAACAATT GTATTCAACA GTAAGGAT 58 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
CTGCTCACCA CACACAAGTG TTATAGGAGG AGTCTGGCCC TTGAG 45 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase C gene for fructOBe 
1 , 6-bisphosphate aldolase 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
ACCTGCAATA CCCCCTTACC CCAATACCAA GACCAACTGG CATAG 45 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 70 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase C gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
GGCATAGAGC CAACTGAGAT AAATGCTATT TAAATAAAGT GTATTTAATG AATTTCTCCA 
AGCTTACGGA 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CCTCCACACG TCAACGATTC TATTTGAAGT TGGGCAGGGG GTGGC 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 
ATTAGA6AA6 ATCGGGGACA CATGTGGGGC TGGGCAGGAG CTG 43 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
GGGCTGGGCA GGAGCTGCCT TATAACCACC CGGGAACCCC TAGCT 45 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
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(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

GCGGAGGGCG GAGTGGTGCC TTTAAAAGGC CGGCGCCGCC TTCCGC 46 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 45 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
TGCGCCGCCC CTTCCGAGGC TAAATCGCTT CCTCTCGGAA CGCGC 45 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(vi) ORIGINAL SOURCE: 
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(C) INDIVIDUAL ISOLATE: Human aldolase B gene 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
AAAAAACATG ATGAGAAGTC TATAAAAATT GTGTGCTACC AAAGA 45 
(2) INFORMATION FOR SEQ ID NO:35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human leukemia inhibitory factor 
(LIF) gene 

^xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
CTTACAACAC AGGCTCCAGT ATATAAATCA GGCAAATTCC CCATTTGAGC 50 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aminopeptidase N gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
GGGGCTCCTC CCCTTTGGGG ATATAAGCCC GGCCTGGGGC TGCTCC 46 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alpha-amylase gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 
AAATGTGCTT CTTACAGGAA TATAAATAGT TTCTGGAAAG GACACTG 47 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human amyloid-beta protein gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
GGGAGGCCTG CGGGGTCGGA TGATTCAAGC TCACGGGGAC GAGCAGG 47 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human amyloid beta protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CGGGGACGAG CAGGAGCGCT CTCGACTTTT CTAGAGCCTC AGCGTCCTAG GACT 54 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human amyloid-beta protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 
GCGGGGTGGG CCGGATCAGC TGACTCGCCT GGCTCTGAGC CCCGCCG 47 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNES5 : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human amyloid-beta protein (APP) 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 

TCAGCTGACT CGCCTGGCTC TGAGCCCCGC CGCCGCGCTC GGGCTCCGTC 50 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human pronatriodilatin precursor 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
TGCTTGGAGA GCTGGGGGGC TATAAAAAGA GGCGGCACTG GGCAGC 46 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for atrial natriuretic 
factor 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
TTGAAGTGGG AG CCTCTTG A GTCAAATCAG TAAGAATGCG GCTCTTGCA 49 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human gene for atrial natriuretic 
factor 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
CTGCGGATGA TAACTTTAAA AGGGCATCTC CTGCTGGCTT CTCACTTGG 49 
(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for atrial natriuretic 
factor 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
TGCTTGGAGA GCTGGGGGGC TATAAAAAGA GGCGGCACTG GGCAGC 46 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

<C) INDIVIDUAL ISOLATE: Human atrial natriuretic factor 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 
CTTGGAGAGC TGGGGGGCTA TAAAAAGAGG CGGCACTGGG CAGCTGGGAG 50 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human angiotensinogen gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
CTCCATCCCC ACCCCTCAGC TATAAATAGG GCCTCGTGAC CCGGCC 46 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human heart/ skeletal muscle ATP/ADP 
translocator gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
TCGCGAGAGC CCGGCGGGGA TATAAGGGGG AGCTGCGGGC CAGGC 45 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apolipoprotein CIII gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
TCTGGACACC CTGCCTCAGG CCCTCATCTC CACTGGTCAG CAGGTGACC 49 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apolipoprotein CIII gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:50: 
CTCAGGCCCT CATCTCCACT GGTCAGCAGG TGACCTTTGC CCAGCGCCC 49 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apo lipoprotein CIII gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
TGCCTGCTGC CCTGGAGATG ATATAAAACA GGTCAGAACC CTCCTGCC 48 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apo lipoprotein CIII gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
GACACCCTGC CTCAGGCCCT CATCTCCACT GGTCAGCAGG TGACCTTTGC 50 
(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apo lipoprotein CIII gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
TGCCTGCTGC CCTGGAGATG ATATAAAACA GGTCAGAACC CTCCTGCC 48 
(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apolipoprotein All gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

ATAATCCCTG CCCCACTGGG CCCATCCATA GTCCCTGTCA CCTGACAGG 49 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apolipoprotein All gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GGGGTGGGTA AACAGACAGG TATATAGCCC CTTCCTCTCC AGCCAG 46 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH t 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human fetal gene for apolipoprotein 
AI precursor 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
CTGCAGACAT AAATAGGCCC TGCAAGAGCT GGC 33 
(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apolipoprotein B gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
GCCTGGGCTT CCTATAAATG GGGTGCGGGC GCCGGCCGC 39 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apoC-II gene for 
preproapolipoprotein C-II 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 
CGGAAGTGGG TCTCAACCAC TATAAATCCT CTCTGTGCCC GTCCGGA kl 
(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE : 

(C) INDIVIDUAL ISOLATE: Human apolipoprotein C-I (VLDL) gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
TGCCCCGCCC CTCCCCAGCC TGATAAAGGT CCTGCGGGCA GGACAGG 47 
(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apolipoprotein D gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:60: 
ATCAGAGACC TGAAGAAGCT TATAAAATAG CTTGGGAGAG GCCAGTC 47 
(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human arginase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
GGTTGTTTAT TCAACCCAAG TATAAATGGA AAAAAAAGAT GCGCC 45 
(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human argininosuccinate synthetase 

gene 



WO 94/14980 



PCT/US93/12388 



243 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: 
CTGCCCCCGG GCCCTGTGCT TATAACCTGG GATGGGCACC CCTGC 45 
(2) INFORMATION FOR SEQ ID NO: 63: 

(X) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human sodium/potassium ATPase alpha 
3 subunit (ATP1 A3) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
CCCCTCCCGC GGACGCGGGC ATATGAGGAG GCGGAGGCGG CGGC 44 
(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human (BSF-2/IL6) gene for B cell 
stimulatory factor-2 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

ATTAGAGTCT CAACCCCCAA TAAATATAGG ACTGGAGATG TCTGAGGC 48 

(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 51 base pairs 
(3) TYPE: nucleic acid 
(C) STRAND ED NESS ; double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human C5 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:65: 
TCTGAATTCT TCAAGTTCAG TTTATTTAAA AGGAGACTAT CCTCAAAAGT G 51 
(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human carbonic anhydrase II gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

CCTCCCCTTG TCGCCTAGGT CCACCCGAGC CCCCTCCCCC GGGCC 45 

(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human carbonic anhydrase II gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
GCACGAAGTT GGCGGGAGCC TATAAAAGCG GGCCGGCGCG ACCCGC 46 
(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human calcitonin/alpha-CGRP gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68; 
TTCCCGACCC ACAGCGGCGG GAATAAGAGC AGTCGCTGGC GCTGG 45 
(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human calretinin gene, exon 1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
CAGGCGCAGG CTCCAGAGCG TATATAAGGG CAGCGTGGCG CACAACC 47 
(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cathepsin G gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
TTCCTTCCTC TCTCAGGGCC TTAAAGTCTA GGAGGAGGAA GCACA 45 
(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human carbonic anhydrase VII (CA 
VII) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
CTCCTCCCGC CAGCCGCTGC TTTAAGAGGC TGCTCCGCGG TAGOG 45 
(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cardiac beta myosin heavy 
chain gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
TCTAGTGACA ACAGCCCTTT CTAAATCCGG CTAGGGACTG GGTGCC 46 
(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cardiac beta myosin heavy 
chain gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
TGGGGGTGCC TGCTGCCCCA TATATACAGC CCCTGAGACC AGGTC 45 
(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human complement C3 gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
TGGGGGAAAG CAGGAGCCAG ATAAAAAGCC AGCTCCAGCA GGCGCTG 47 
(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5C base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human recognition/ surface antigen 
(CD4 ) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
CAAGTCCTCA CACAGATACG CCTGTTTGAG AAGCAGCGGG CAAGAAAGAC 50 
(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human hyaluronate receptor gene 
(CD44) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 
TAGGTCACTG TTTTCAACCT CGAATAAAAA CTGCAGCCAA CTTCCGAGGC 50 
(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cystic fibrosis transmembrane 
conductance reg. gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
AATGACATCA CAGCAGGTCA GAGAAAAAGG GTTGAGCGGC AGGCACCCAG 50 
(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cholesterol 
7-alpha-hydroxylase (CYP7) gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: 
ATGGATCTGG ATACTATGTA TATAAAAAGC CTAGCTTGAG TCTCTT 46 
(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii> HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human choline acetyltransf erase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 
AGCAATTGTG ACCCACAGCC TAATAATAAC AGTCTTTGCC CTCTTGGCC 49 
(2) INFORMATION FOR SEQ ID NO: 80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human mast cell chymase gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 
CTCTCTTGCC TTCTAGGAGT TATAAAACCC AAGACTGGAA AGGAAA 46 
(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE : 

(C) INDIVIDUAL ISOLATE: Human heart chyroase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:81: 
CCTCTCTTGC CTTCTGGGAG TTATAAAACC CAAGACTGGA AGGAAAA 47 
(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human creatine kinase B gene 
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{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 
GGCCAATGGA ATGAATGGGC TATAAATAGC CGCCAATGGG CGGCCCGC 48 
(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human C-type natriuretic peptide 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 
ACATCAGCGG CAGGTTGGAT TATAAAGGCG CGAGCAGAGT CACGGG 46 
(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human transmembrane protein (CD59) 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:84: 
GCTCCGCGCG GGGGTGGAGG GAGAGGAGGA GGTTCCTGCC GAGGT 45 
(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 46 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(C> INDIVIDUAL ISOLATE: Human transmembrane protein (CD59) 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:85: 
GAGGGCAAGG GCATCCTGAG GGGCGGGGCC GGGGGCGGAG CCTTGC 46 
(2) INFORMATION FOR SEQ ID NO:86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human transmembrane protein (CD59) 
gene 



WO 94/14980 



PCT/US93/12388 



255 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 
ATCCTGAGGG GCGGGGCCGG GGGCGGAGCC TTGCGGGCTG GAGCGA 46 
{2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human transmembrane protein (CD59) 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 
TGAGGGGCGG GGCCGGGGGC GGAGCCTTGC GGGCTGGAGC GAAAGAATGC 50 
(2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human myeloid specific CDllb gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:88: 

GCCCTCTTCC TTTGAATCTC TGATAGACTT CTGCCTCCTA CTTCTC 46 

(2) INFORMATION FOR SEQ ID NO: 89; 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
{ B ) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cholesteryl ester transferase 
protein (CETP) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 
GTGGGGGCTG GGCGGACATA CATATACGGG CTCCAGGCTG AACGGC 46 
(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human cystic fibrosis transmembrane 
conductance regulator 



WO 94/14980 



PCTAJS93/12388 



257 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 
TGGGTGGGGG GAATTGGAAG CAAATGACAT CACAGCAGGT CAGAG 45 
(2) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cystic fibrosis transmembrane 
conductance regulator 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 
GTGGGGGGAA TTGGAAGCAA ATGACATCAC AGCAGGTCAG AGAAAAA 47 
(2) INFORMATION FOR SEQ ID NO: 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human coseg gene for 
vasopres s in-neurophys in precursor 



WO 94/14980 



PCT/US93/12388 



258 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:92: 
CACGGGAACA CCTGCGGACA TAAATAGGCA GCCAGCAGAG GCAGCA 46 
(2) INFORMATION FOR SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human creatine kinase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 
TTCAGAGAAA GGGCAGGTGC TATAAAGGGC CCAGCGCCAC GGGCCT 46 
(2) INFORMATION FOR SEQ ID NO:94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alpha-B-crystallin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 
AGAAGCTTCA CAAGACTGCA TATATAAGGG GCTGGCTGTA GCTGCAG 47 
(2) INFORMATION FOR SEQ ID NO: 95: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human C3 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 95: 
AGTGGGGGAA AGCAGAGCCA GATAAAAAGC CAGCTCCAGC AGGCGCTGCT 50 
(2) INFORMATION FOR SEQ ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human colony stimulating factor 
CSF-1 gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:96: 
GCCTGGCCAG GGTGATTTCC CATAAACCAC ATGCCCCCCA GTCCTC 46 
(2) INFORMATION FOR SEQ ID NO :97s 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytotoxic serine proteinase 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 
GTTACTCAGC AGCAGGGGTG TAAATGTGAC AGTGCCATGT CAAC 44 
(2) INFORMATION FOR SEQ ID NO:98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human CST3 gene for cystatin C 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 
GGCGGCGAAG GCCGGAAGGG ATAAAACCGC AGTCGCCGGC CTCGCG 46 
(2) INFORMATION FOR SEQ ID NO: 99: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNE5S : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human CST4 gene for Cystatin D 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 
TTGGGGGACA CCCAAGTAGG ATAAATGCAC AGCTAGCTTC TGGCC 45 
(2) INFORMATION FOR SEQ ID NO: 100: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human CYP2C8 gene for cytochrome 
P-450 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100: 
ACTAAATTAG CAGGGAGTGT TATAAAAACT TTGGAGTGCA AGCTC 45 
(2) INFORMATION FOR SEQ ID NO: 101 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pair a 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cholesterol desmolase 
cytochrome gene 

(xi) SEQUENCE DESCRIPTION: SEQ TD NO: 101: 
AGCAGGAGGA AGGACGTGAA CATTTTATCA GCTTCTGGTA TGGCC 45 
(2) INFORMATION FOR SEQ ID NO: 102: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH i 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cholesterol desmolase 
cytochrome P-450 gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 
TATGGCCTTG AGCTGGTAGT TATAATCTTG GCCCTGGTGG CCCAGG 46 
(2) INFORMATION FOR SEQ ID NO: 103: 

(t) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human steriod 11-beta-hydroxylase 
(CYP11B1) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 
GAAGGCAAGG CACCAGGCAA GATAAAAGGA TTGCAGCTGA ACAGGGT 47 
(2) INFORMATION FOR SEQ ID NO: 104: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human CYPXI gene for steroid 
18-hydroxylase (P-450 C18) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 
CAGAGCAGGT TCCTGGGTGA GATAAAAGGA TTTGGGCTGA ACAGGGT 47 
(2) INFORMATION FOR SEQ ID NO: 105: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

<C) INDIVIDUAL ISOLATE: Human CYPXIX aromatase P-450 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105: 
TGGACAATAA ATGAAATCTC CATAAAAGGC CCAAAGGACA GGGTTC 46 
(2) INFORMATION FOR SEQ ID NO: 106: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human decay-accelerating factor 
( DAF ) gene 



WO 94/14980 



PCTAJS93/12388 



265 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 
AGCCCAGACC CCGCCCAAAG CACTCATTTA ACTGGTATTG CGGAGC 46 
(2) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human dopamine beta-hydroxylase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 
ACGTCCATGT GTCATTAGTG CCAATTAGAG GAGGGCAGCA GGCTG 45 
(2) INFORMATION FOR SEQ ID NO: 108: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human dopamine beta-hydroxylase gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: 

ACCCCATTCA GGACCAGGGC ATAAATGGCC AGGTGGGACC AGAGAG 46 

(2) INFORMATION FOR SEQ ID NO: 109: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
{ B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human desmin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 
GGGGCTGATG TCAGGAGGGA TACAAATAGT GCCGACGGCT GGGGGC 46 
(2) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytokeratin 8 (CK8) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:110: 
CCCGGGGCTG GGATCTCTTT TATAAAAGGC CATTCCTGAG AGCTC 45 
(2) INFORMATION FOR SEQ ID NO: 111: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human DNA polymerase alpha gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: 
GCCTCCCGAG CCGCTGATTG GCTTTCAGGC TGGCGCCTGT CTCGGCCCCC 50 
(2) INFORMATION FOR SEQ ID NO:112: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human dopamine D1A receptor gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 
GCTGTGCCCC CCGGGAACCC CGCCGGCCTG TGCGCTTGCT GGTGCCAGCT 50 
(2) INFORMATION FOR SEQ ID NO: 113 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human eosinophil cat ionic protein 
(ECP) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 
AGACCCACCA AGGGAAGCTT TATTTAAACA GTTCCAAGTA GGGGAGA 47 
(2) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human HER2 gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: 
GAGGAGGAGG GCTGCTTGAG GAAGTATAAG AATGAAGTTG TGAAGCTGAG 5Q 
(2) INFORMATION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human elastin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: 
GTGTCTCGCT GTGATAGATC AATAAATATT TTATTTTTTG TCCTGG 46 
(2) INFORMATION FOR SEQ ID NO: 116: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human endothelial leukocyte adhesion 
molecule I (ELAM-1) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 
ATTCACAGGA AGCAATCCCT CCTATAAAAG GGCCTCAGCC GAAGTAGTG 49 
(2) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

{iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human eosinophil major basic protein 

gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 
GGAAGTTCCT CCAAGGCCTC TATATAAGAA GTCTTTGTGA GAGGAAG 47 
(2) INFORMATION FOR SEQ ID NO: 118: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human preproenkephalin B gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 
CTCTAGGAAA GTTTCTCAGC TCTCAAACCT CTGTTTTCTC ATCTGCAAG 49 
(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human preproenkephalin B gene 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: 
TTCTCATCTG CAAGATGGGG ATAATATTAA CCAACTGGCT AGGTCATGAG 50 
(2) INFORMATION FOR SEQ ID NO: 120: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human EN03 gene for muscle-specific 
enolase 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: 
GGGGACCGAG TGGCTCAGGG ATAAATGCGC ACCTGAGAGG GGGTGA 46 
(2) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human eosinophil derived neurotoxin 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 
CAACCCACCA AGGGATGCTT TATTTAAACA GTTCCAAGTA GGGGAGA bfl 
(2) INFORMATION FOR SEQ ID NO: 122: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human erythropoietin receptor 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122 : 
TACCCAGGCT GAGTGCTGGC CCCGCCCCCT CGGGGATCTG CCACTT 46 
(2) INFORMATION FOR SEQ ID NO: 123: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human c-erb B2/neu protein gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123: 
AGGAGGGCTG CTTGAGGAAG TATAAGAATG AAGTTGTGAA GCTGA 45 
(2) INFORMATION FOR SEQ ID NO: 124: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ERCC2 gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 124: 
CCGATTGGCT CTGCCCTAGC GGATTGACGG GCAGGTTAGC CAATGGTCT 49 
(2) INFORMATION FOR SEQ ID NO: 125: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ERCC2 gene 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125: 
CAGGTTAGCC AATGGTCTCG TAATATAGGT GGAGCGAGCC CTCGAGG 47 
(2) INFORMATION FOR SEQ ID NO: 126: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human erythropoietin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126: 
GGTCACCCGG CGCGCCCCAG GTCGCTGAGG GACCCCGGCC AGGCGCGGAG 50 
(2) INFORMATION FOR SEQ ID NO: 127: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human oestrogen receptor 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:127: 
ATATGAGCTC GGGAGACCAG TACTTAAAGT TGGAGGCCCG GGAGCCCA 48 
(2) INFORMATION FOR SEQ ID NO: 128: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human elastase I gene 
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{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128: 
AGCTTTGCTG CTAAGAGGAG TATAAAGAGG GCTTGGTCCA AGCAAG 46 
(2) INFORMATION FOR SEQ ID NO: 129: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human fibrinogen gamma chain gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:129: 
GGCCCCGTGA TCAGCTCCAG CCATTTGCAG TCCTGGCTAT CCCA 44 
(2) INFORMATION FOR SEQ ID NO: 130: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human fibrinogen gamma chain gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 130: 
TGGCTATCCC AGGAGCTTAC ATAAAGGGAC AATTGGAGCC TGAGA 45 
{2> INFORMATION FOR SEQ ID NO: 131 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human lymphocyte IgE receptor gene 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 
TTAACATCTC TAGTTCTCAC CCAATTCTCT TACCTGAGAA ATGGA 45 
(2) INFORMATION FOR SEQ ID NO: 132: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human lymphocyte IgE receptor gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132: 
GTTATCCGGG TGGCAAGCCC ATATTTAGGT CTATGAAAAT AGAAGCT 47 
(2) INFORMATION FOR SEQ ID NO: 133: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY: linear 

{ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human lymphocyte IgE receptor gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 133: 
AGCCCATATT TAGGTCTATG AAAATAGAAG CTGTCAGTGG CTCTAC 46 
(2) INFORMATION FOR SEQ ID NO: 134: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apoferritin H gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134: 
GGGCCTGACG CCGACGCGGC TATAAGAGAC CACAAGCGAC CCGCA 45 
(2) INFORMATION FOR SEQ ID NO: 135: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human fibrinogen beta gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:135: 
TATTAACTAA GGAAAGGTAA CCATTTCTGA AGTCATTCCT AGCAGA 46 
(2) INFORMATION FOR SEQ ID NO: 136: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human fibrinogen beta gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 136: 
ATTCCTAGCA GAGGACTCAG ATATATATAG GATTGAAGAT CTCTCAGTT 49 
(2) INFORMATION FOR SEQ ID NO: 137: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human factor IX gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137: 
CCAGAAGTAA ATACAGCTCA GCTTGTACTT TGGTACAACT AATCGACCTT 50 
(2) INFORMATION FOR SEQ ID NO: 138: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE : 

(C) INDIVIDUAL ISOLATE: Human FK506 binding proteins 12A, 
12B, and 12C (FKBP12) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:138: 
GAGCCGTGGA ACCGCCGCCA GGTCGCTGTT GGTCCACGCC GCCCGTCGCG 50 
(2) INFORMATION FOR SEQ ID NO: 139: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human 5- lipoxygenase activating 
protein (FLAP) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:139: 
TTGTGCCGGG GATCTTCAGA AATTGTAATG ATGAAAGAGT GCAAGCTCTC 50 
(2) INFORMATION FOR SEQ ID NO: 140: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human fos proto-oncogene (c-fos) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: 
ATTCATAAAA CGCTTGTTAT AAAAGCAGTG GCTGCGGCGC CTCGTACTCC 50 
(2) INFORMATION FOR SEQ ID NO: 141: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human GOS2 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: 
GGCGTGTCTC AGAGAAAAGA TATAAGCGGC CCCCGGACGC TAAAG 45 
(2) INFORMATION FOR SEQ ID NO: 142: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human granulocyte colony-stimulating 
factor gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142 : 
CAGGCCTCCA TGGGGTTATG TATAAAGGGC CCCCTAGSGC TGGGCC 
(2) INFORMATION FOR SEQ ID NO: 143: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

<C) INDIVIDUAL ISOLATE: Human EGR2 gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 143: 
CGGGTATTGA AGACCTGCCC ATAAATACTT AGAGCAACAC TTTCCGTC 48 
(2) INFORMATION FOR SEQ ID NO: 144: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human growth hormone (hGH) gene 
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I 

fi? IRIPTION: SEQ ID NO: 144: 

o© TATAAAAAGG GCCCACAAGA GACCAG 46 

Oo 

*EQ ID NO: 145: 

ARACTERISTICS: 
I: 50 base pairs 
[&) * nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gastric inhibitory polypeptide 
(GIP) mRNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 145 : 
TAATCAGCAG GTCTATGCCT AATATAAAGG AGCTGGGGCA TGATTTCTTC 50 
(2) INFORMATION FOR SEQ ID NO: 146: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human GLA gene for 
alpha-D-galactosidase A 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 146: 
GAAACAATAA CGTCATTATT TAATAAGTCA TCGGTGATTG GTCCGC 46 
(2) INFORMATION FOR SEQ ID NO: 147 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human glucagon gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 147: 

TTTACAGATG AGAAATTTAT ATTGTCAGCG TAATATCTGT GAGG 44 

(2) INFORMATION FOR SEQ ID NO: 148: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNES5 : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human glucagon gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 148: 

GGCTAAACAG AGCTGGAGAG TATATAAAAG CAGTGCGCCT TGGTGCA 47 

(2) INFORMATION FOR SEQ ID NO: 149: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 47 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human granulocyte-macrophage colony 
stimulating factor gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 149: 
CATTAATCAT TTCCTCTGTG TATTTAAGAG CTCTTTTGCC AGTGAGC 47 
(2) INFORMATION FOR SEQ ID NO: 150: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human glucocorticoid receptor gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 150; 
TGGGCAATGG GAGACTTTCT TAAATAGGGC TCTCCCCCCA CCCATG 46 
(2) INFORMATION FOR SEQ ID NO: 151: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pair9 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human growth hormone releasing 
factor (GRF) gene 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 151: 
AACGCTTAGG AAAATGAAGA GATAAATGAT GGGAACGCCA GGCGGCTGCC 50 
(2) INFORMATION FOR SEQ ID NO: 152: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human GST pi gene for glutathione 
S-transf erase pi 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 152: 
GAGCGGGGCG GGACCACCCT TATAAGGCTC GGAGGCCGCG AGGC ^4 
(2) INFORMATION FOR SEQ ID NO: 153: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDUESS : double 

(D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

{iii) HYPOTHETICAL: HO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: human glycophorin C (GPC) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:153: 
CAGAAGTGGG CGGGTGTGTG TTTAAAAAAA AAAAAAGGGG TGGAAAC 47 
(2) INFORMATION FOR SEQ ID NO: 154: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human histone (H10) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154: 
CGCGGTCCGC CCGCCGCCGC TAAATACCCG GATGCGCCGC CCAAGC 46 
(2) INFORMATION FOR SEQ ID NO: 155: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 44 base pairB 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for HI RNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155: 
GTCTTTGGAT TTGGGAATCT TATAAGTTCT GTATGAGACC ACTC 44 
(2) INFORMATION FOR SEQ ID NO:156: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human HI histone gene FNC16 
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(xi) SEQUENCE DESCRIPTION; SEQ ID NO; 156 : 
GGCGGTGGAT TGGACGCTCC ACCAATCACA GGGCAGCGCC GGCTTA 46 
(2) INFORMATION FOR SEQ ID NO: 157: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human histone gene FNC16 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 157: 
ACCAATCACA GGGCAGCGCC GGCTTATATA AGCCCGGGCC CGAGCATAGC AGCA 54 
(2) INFORMATION FOR SEQ ID NO: 158: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human H2B.2 and H2A.1 genes for 
Histone H2A and H2B 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:158: 
TTTTCGCGCC CAGCAGCTGC TATAAAATGC GCGTCCCTGT AGGTTCC 47 
(2) INFORMATION FOR SEQ ID NO: 159: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human H4/a gene for H4 histone 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 159 : 
GGGGGCAGGG GTAACGTAGA TATATAAAGA TCGGTTTCCT ATTCTCTC 48 
(2) INFORMATION FOR SEQ ID NO: 160: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human H4/b gene for H4 histone 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO; 160: 
CTGCAAGTAT AGTGTGTGTG TATATATATA TATATACCTA GCAGTATTTA TTAAAT 56 
(2) INFORMATION FOR SEQ ID NO: 161: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human androgen receptor gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:161: 
GGTGGGGGCG GGACCCGACT CGCAAACTGT TGCATTTGCT CTCCACCTCC 50 
(2) INFORMATION FOR SEQ ID NO: 162: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human chorionic gonadotropin (hCG) 
beta subunit 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 162: 
GCCCTCTCTC ATTGGGCAGA AGCTAAGTCC GAAGCCGCGC CCCTCCTGG 49 
(2) INFORMATION FOR SEQ ID NO: 163: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human islet amyloid polypeptide 
(hIAPP) gene 

(xi) SEQUENCE DESCRIPTION: SEQ -ID NO: 163: 
GCTGAGAAAG GTGTGAGGGG TATATAAGAG CTGGATTACT AG XT AG C AAA 50 
(2) INFORMATION FOR SEQ ID NO: 164: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi). 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human H4 histone gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 164 : 
CTTCCCGCCG GCGCGCTTTC GGTTTTCAAT CTGGTCCGAT ATCTCTGTAT AT 52 
(2) INFORMATION FOR SEQ ID NO: 165: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

<C) INDIVIDUAL ISOLATE: Human H4 histone gene 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 165: 
AATCTGGTCC GATATCTCTG TATATTACGG GGAAGACGGT GACGCTC 47 
(2) INFORMATION FOR SEQ ID NO: 166: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human histone H2a gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166: 

TCCTCTTTTC TTGGCGAACT CAACTGGTAT GAATTCCTCA 40 

(2) INFORMATION FOR SEQ ID NO: 167: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
(6) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human histone H2a gene 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 167: 
CACAGCCTAC CTCCAGTCAG TATAAATACT TCTCTGCCTT GCGTTC 46 
(2) INFORMATION FOR SEQ ID NO: 168: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human histone H2b gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:168: 
TATTTGCATA AGCGATTCTA TATAAAAGCG CCTTGTCATA CCCTGCT 47 
(2) INFORMATION FOR SEQ ID NO: 169: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human histone H3 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:169: 
ATTTTTGAAT TTTCTTGGGT CCAATAGTTG GTGGTCTGAC TCTAT 45 
(2) INFORMATION FOR SEQ ID NO: 170: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human histone H3 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 170: 
CAATAGTTGG TGGTCTGACT CTATAAAAGA AGAGTAGCTC TTTCCTT 47 
(2) INFORMATION FOR SEQ ID NO: 171: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH ; 45 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human HLA-A1 gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 171: 
AGTGTCGTCG CGGTCGCTGT TCTAAAGTCC GCACGCACCC ACCGG 45 
(2) INFORMATION FOR SEQ ID NO: 172: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human HLA-B27 antigen gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172: 
AGTGTCGCCG GGGTCCCAGT TCTAAAGTCC CCACGCACCC ACCCGG 46 
(2) INFORMATION FOR SEQ ID NO: 173: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C> INDIVIDUAL ISOLATE: Human HLA-Bw57 gene 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 173: 
AGCGTCGCCG CGGTCCCAGT TCTAAAGTCC CCACGCACCC ACCCG 45 
(2) INFORMATION FOR SEQ ID NO: 174: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 
^ (O) INDIVIDUAL ISOLATE: Human HLA-F gene for human leukocyte 

antigen F 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 174: 
TGTCGCCGCA GTTCCCAGGT TCTAAAGTCC CACGCACCCC GCGGGA ^6 
(2) INFORMATION FOR SEQ ID NO: 175: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH ; 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for histocompatibility 
antigen HLA-A3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 175: 
AGTGTCGTCG CGGTCGCTGT TCTAAAGCCC GCACGCACCC ACCGGG 46 
(2) INFORMATION FOR SEQ ID NO: 176: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human gene for class I 
histocompatibility antigen HLA-CW3 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:176: 
CATTGGGTGT CGGACCTCTA GAAGGCCGGT CAGCGTCTCC GC 42 
(2) INFORMATION FOR SEQ ID NO: 177: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human HMG-17 gene for non-histone 
chromosomal protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 177: 
CGGTCCGGGG CTCCCAGCGC TATAAAAACT TTATAAACCC CCCGGA 46 
(2) INFORMATION FOR SEQ ID NO: 178: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human HOX3D gene for homeoprotein 
HOX3D 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 178; 
AAGAAAGAGA TATCTCCACC TATAAATTGT CCACTTTGGA GAACAA 46 
(2) INFORMATION FOR SEQ ID NO: 179: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human 71Kd heat shock cognate 
protein (hsc70) 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 179: 
TGGAAGGTTC TAAGATAGGG TATAAGAGGC AGGGTGGCGG GCGGA 45 
(2) INFORMATION FOR SEQ ID NO: 180: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



'""'(■vi )" ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human heat shock protein (hsp 70) 
gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 180: 
AAGGCGGGTC TCCGTGACGA CTTATAAAAG CCCAGGGGCA AGCGGTCCGG 50 
(2) INFORMATION FOR SEQ ID NO: 181: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

<C) INDIVIDUAL ISOLATE: Human hsp70B gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 181: 
CTTCGGTCTC ACGGACCGAT CCGCCCGAAC CTTCTCCCGG GGTCAG 46 
(2) INFORMATION FOR SEQ ID NO: 182: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human hsp70B gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 182: 
CCGCCCGGCT GACTCAGCCC GGGCGGGCGG GCGGGAGGCT CTCGAC 46 
(2) INFORMATION FOR SEQ ID NO: 183: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<iii) HYPOTHETICAL: YES 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human hsp70B gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 183: 
CCGGCTGACT CAGCCCGGGC GGGCGGGCGG GAGGCTCTCG ACTGGG 46 
(2) INFORMATION FOR SEQ ID NO: 184: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE; 

(C) INDIVIDUAL ISOLATE: Human hsp70B gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:184: 
CTGACTCAGC CCGG6CGCGC GGGCGGGAGG CTCTCGACTG GGCGGG 46 
(2) INFORMATION FOR SEQ ID NO; 185: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human . hsp70B gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 185: 
GGGCGGGCGG GAGGCTCTCG ACTGGGCGGG AAGGTGCGGG AAGGT 45 
(2) INFORMATION FOR SEQ ID NO: 186: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human hsp70B gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:186: 
CGGCGGGGTC GGGGAGGTGC AAAAGGATGA AAAGCCCGTG GACGGAGCTG AGC 53 
(2) INFORMATION FOR SEQ ID NO: 187: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human IAPP gene for islet amyloid 
polypeptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 187: 
GCTGAGAAAG GTGTGAGGGG TATATAAGAG CTGGATTACT AGTTAGC 47 
(2) INFORMATION FOR SEQ ID NO: 188: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human intercellular adhesion 
molecule 1 (ICAM-1) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 188 : 
AGGTTTCCGG GAAAGCAGCA CCGCCCCTTG GCCCCCAGGT GGCTAG 46 
<2) INFORMATION FOR SEQ ID NO: 189 t 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human intercellular adhesion 
molecule 1 (ICAM-1) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 189: 
GGCCCCCAGG TGGCTAGCGC TATAAAGGAT CACGCGCCCC AGTCGA 46 
(2) INFORMATION FOR SEQ ID NO: 190: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interferon-inducible gene 
IFI-54K 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 190: 
AAAGGAACCA GAGGCCACTG TATATATAGG TCTCTTCAGC ATTTATTG 48 
(2) INFORMATION FOR SEQ ID NO: 191: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interferon alpha gene 
IFN-alpha 14 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 191: 
ATGGAAGCXA GTATGTTCCT TATTTAAGAC CTATGCACAG AGCAAGGT 48 
(2) INFORMATION FOR SEQ ID NO: 192: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human interferon alpha gene 
IFN-alpha 16 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 192: 
GAAATTAGTA TGTTCACTAT TTAAGAACTA TGCACAGAGC AAAGT 45 
(2) INFORMATION FOR SEQ ID NO: 193: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
{D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interferon alpha gene 
IFN-alpha 5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:193: 
ATGGAAACTC GTATGTGACC TTTTTAAGAT CTGTGCACAA AACAAGGT 48 
(2) INFORMATION FOR SEQ ID NO: 194: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human interferon alpha gene 
IFN-alpha 6 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 194: 
ATGGAAACTA GTATGTTCCC TATTTAAGAC CTACACATAA AGCAAGGT 48 
(2) INFORMATION FOR SEQ ID NO: 195: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interferon alpha gene 
IFN-alpha 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 195: 
ATGGAAATTA GTATGTTCAC TATTTAAGAC CTATGCACAG AGCAAAGT 48 
(2) INFORMATION FOR SEQ ID NO: 196: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human immune interferon (INF-gamma) 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:196: 
TCCTCAGGAG ACTTCAATTA GGTATAAATA CCAGCAGCCA GAGGAGGTGC 59 
(2) INFORMATION FOR SEQ ID NO: 197: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alpha/beta-interf eron 
(IFN) -inducible 6-16 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:197: 
GGGAGGATCC ACAAGTGATG ATAAAAAGCC AGCCTTCAGC CGGAG 45 
(2) INFORMATION FOR SEQ ID NO:l98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL : NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human insulin like growth factor II 
(IGF-2) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 198: 
CTGGGAGGAG TCGGCTCACA CATAAAAGCT GAGGCACTGA CCAGCCT 47 
(2) INFORMATION FOR SEQ ID NO: 199: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human insulin-like growth factor 
binding protein gene 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 199: 

GTGGCGCGGC CTGTGCCCTT TATAAGGTGC GCGCTGTGTC CAGCG 45 

(2) INFORMATION FOR SEQ ID NO:200: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 48 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human germline leader peptide and 
variable region of 1154 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 200; 
CAACCTCCTG CACTGAAGCC TTATTAATAG GCTGGCCACA CTTCATGC 48 
(2) INFORMATION FOR SEQ ID NO: 201 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE : 

(C) INDIVIDUAL ISOLATE: Human germline for leader peptide 
variable region of 2908 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 201: 
CAACCTCCTG CCCTGAAGAC TTATTAATAG GCTGGACACA CTTCATGC 48 
(2) INFORMATION FOR SEQ ID NO: 202: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human rearranged kappa 
immunoglobulin subgroup V 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 202: 
CCACGACCAG GTGTTTGGAT TTTATAAACG GGCCGTTTGC ATTGTGAA 
(2) INFORMATION FOR SEQ ID NO: 203: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human rearranged kappa 
immunoglobulin gene subgroup V 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 203: 
CGCCCTGCAG TCCAGAGCCC ATATCAATGC CTGGGTCAGA GCTCTGGA 
(2) INFORMATION FOR SEQ ID NO: 204: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human germline fragment for 
immunoglobulin kappa light chain 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 204: 

TGCCCTACCT TCCAGAGCCC ATATCAATGC CTGTGTCAGA GCCCTGGG 

(2) INFORMATION FOR SEQ ID NO: 205: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human germline immunoglobulin kappa 
light chain V-segment 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:205: 
ACTTCCCTTG TGGGTCTGAG ATAAAAGCTC AGCTCTAACC CTTACC 
(2) INFORMATION FOR SEQ ID NO: 206: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interleukin-2 (IL-2) gene 



WO 94/14980 



PCT/US93/12388 



315 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 206: 
TATTTTTCCA GAATTAACAG TATAAATTGC ATCTCTTGTT CAAGAG 
(2) INFORMATION FOR SEQ ID NO: 207: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human , gene for interleukin 1 alpha 
(IL-1 alpha) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 207: 
CCACGCCTAC TTAAGACAAT TACAAAAGGC GAAGAAGACT GACTCAG 
(2) INFORMATION FOR SEQ ID NO: 208: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for prointerleukin 1 beta 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:208: 
TTGATTGTGA AATCAGGTAT TCAACAGAGA AATTTCTCAG CCTCCTAC 
(2) INFORMATION FOR SEQ ID NO: 209: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human, gene for prointerleukin 1 beta 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:209: 
CTACTTCTGC TTTTGAAAGC TATAAAAACA GCGAGGGAGA AACTGGC 
(2) INFORMATION FOR SEQ ID NO; 210: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 
: : ; i ND iviDUAL ISOLATE: human interleukin 2 receptor gene 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO:210: 

AGAAAGGATT CATAAATGAA GTTCAATCCT TCTCATCACC CCAGCCCA 

(2) INFORMATION FOR SEQ ID NO: 211: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interleukin 2 receptor gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:211: 
TTTGAAAAAT TACCGCAAAC TATATTGTCA TCAAAAAAAA AAAAAA 
(2) INFORMATION FOR SEQ ID NO: 212: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human interleukin 4 gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 212: 
ATCTGGTGTA ACGAAAATTT CCAATGTAAA CTCATTTTCC CTCGG 
(2) INFORMATION FOR SEQ ID NO: 213: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE ; 

(C) INDIVIDUAL ISOLATE: Human inter leukin 4 gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 213: 
GGTTTCAGCA ATTTTAAATC TATATATAGA GATATCTTTG TCAGCATT 
(2) INFORMATION FOR SEQ ID NO: 2 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interleukin S (IL-5) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:214: 
CATTTCCTCA AAGACAGACA ATAAATTGAC TGGGGACGCA GTCTTGTACT 
(2) INFORMATION FOR SEQ ID NO: 215; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interleukin 7 (IL-7) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:215: 
TTGCTTTGAT TCAGGCCAGC TGGTTTTTCT GCGGTGATTC GGAAATTCGC 
(2) INFORMATION FOR SEQ ID NO: 216: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

. ( 

(C) INDIVIDUAL ISOLATE: Human interleukin 9 gene (IL-9) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 216: 
TTCCGTGTTT GAGAGGGAGC TTTAAATACC ACTCGATTTG AAGGTGTC 
(2) INFORMATION FOR SEQ ID NO: 2 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human int-1 mammary oncogene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 17: 
ACTTCAGCCA GCGCCGCAAC TATAAGAGGC GGTGCCGCCC GCCGT 
(2) INFORMATION FOR SEQ ID NO:218: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human jun-B gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:218: 
TCCGTGGCTG ACTAGCGCGG TATAAAGGCG TGTGGCTCAG GCTGAG 
(2) INFORMATION FOR SEQ ID NO:219: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human DNA for 65 kD keratin type II 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 219: 
GCCCAACAAC CTCCTCAAAT GTATATAAAG GGATTTTTAT TGCACA 
(2) INFORMATION FOR SEQ ID NO:220: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ultra high-sulphur keratin 
protein gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220: 
TGGTGTGTTC CTATGTGGGA TATAAAGAGC CGGGGCTCAG GGGGCT 
(2) INFORMATION FOR SEQ ID NO: 221: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alpha-1 act albumin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 221: 
CCTGAGGCTT TCTGCATGAA TATAAATAAA TGAAACTGAG TGATGCT 
(2) INFORMATION FOR SEQ ID NO: 222: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human LAG-1 gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 222: 
GTCCTAGGCC TCAGAGTCCC TATAAGAGAG ATTCCCAACT CAGTA 
(2) INFORMATION FOR SEQ ID NO: 223: 

(X) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human lecithin-cholesterol 

acyltransf erase (LCAT) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 223: 
CTGAGGCTGT GCCCCTTTCC GGCAATCTCT GGCCACAACC CCCACTGG 
(2) INFORMATION FOR SEQ ID NO: 224: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human lecthin-cholesterol 
acyltransf erase (LCAT) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 224: 

CCCCTCCCAC TCCCACACCA GATAAGGACA GCCCAGTGCC GCTTT 

(2) INFORMATION FOR SEQ ID NO; 225: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 50 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human lymphocyte-specific protein 
kinase (lck) gene 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 225: 

GGGAGCAGAT CTTGGGGGAG CCCCTTCAGC CCCCTCTTCC ATTCCCTCAG 

(2) INFORMATION FOR SEQ ID NO: 226: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 48 base pairs 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human leukocyte fuct ion-associated 
antigen-1 (LFA-1) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 6: 
GGGTATCTCA CTGTGGTTTG ATTTGCATTT CTCTAATGAC TAATAGTG 
(2) INFORMATION FOR SEQ ID NO;227: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human leukocyte fuct ion-associated 
antigen-1 (LFA-1) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 227: 
ATGTCTCTAA CTTGCTTACA CTTCCTCCCT GAACCCTGCG GTTTCA 
(2) INFORMATION FOR SEQ ID NO: 228: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

( ii ) MOLECULE TYPE : DNA ( genomic ) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human leukocyte function-associated 
antigen-1 (LFA-1) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:228: 
TCCTGCAGGC ACACCTCCCT CCCCGCCTGC CAGTGTCACC AGCCTGTT 
(2) INFORMATION FOR SEQ ID NO: 229: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human leukocyte function-associated 
antigen-1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:229: 
CTGTTGCCTC TGTGAGAAAG TACCACTGTA AGAGGCCAAA GGGCATGATC 
(2) INFORMATION FOR SEQ ID NO: 230: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: human lipoprotein lipase (LPL) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:230: 
TATTTGCATA TTTCCAGTCA CATAAGCAGC CTTGGCGTGA AAACAGT 
(2) INFORMATION FOR SEQ ID NO: 231: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human leukocyte adhesion molecule-1 
(LAM-1) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 231: 
TGGGTTAGAG AAATGAAAGA AAGCAAGGCT TTCTGTTGAC ATTCAGTGCA 
(2) INFORMATION FOR SEQ ID NO: 232: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human lysozyme gene 



WO 94/14980 



PCT/US93/12388 



328 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 232: 
AGAAGGAAGT TAAAAGATGT TAAATACTGG GGCCAGCTCA CCCTGG 
(2) INFORMATION FOR SEQ ID NO: 233: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human mannose binding protein 1 
(MBP1) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 233: 
AGGGATGGGT CATCTATTTC TATATAGCCT GCACCCAGAT TGTAGG 
(2) INFORMATION FOR SEQ ID NO: 234: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human mast cell carboxypeptidase A 
(MC-CPA) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:234: 



CATCAAGATA AGGGCTGA66 CATAAAACTG CCAGAGGGTC TCAAGG 



(2) INFORMATION FOR SEQ ID NO: 235: 



(i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human P-glycoprotein (MDR1) gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 235: 
CTTTGCCACA GGAAGCCTGA GCTCATTCGA GTAGCGGCTC TTCCA 
(2) INFORMATION FOR SEQ ID NO: 236: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 
'■'.' ■■[■.'^■'■■- ~ (C) INDIVIDUAL ISOLATE: Human bone marrow serine protease 

gene (medullasin) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:236: 
ACGGCCTCCC AGCACAGGGC TATAAGAGGA GCCGGGCGGG CACGG 
(2) INFORMATION FOR SEQ ID NO:237: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human microsomal epoxide hydrolase 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 237: 
TTGCTGTGCA GAGTCCAGGG GAGATAACCA CGCTGTGCAC ACATGAG 
(2) INFORMATION FOR SEQ ID NO: 238: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human metallothionein-Ie gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:238: 
GCAGCCAGTT GCAGGGCTCC ATTCTGCTTT CCAACTGCCT GACTGCTTGT TC 
(2) INFORMATION FOR SEQ ID NO: 239: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human myoglobin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 239: 
TTGTCAAGCA TCCCAGAAGG TATAAAAACG CCCTTGGGAC CAGGCA 
(2) INFORMATION FOR SEQ ID NO: 240: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS ; double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human melanoma growth stimulatory 
activity (MGSA) gene 



WO 94/14980 



PCT/US93/12388 



332 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 240: 
GCTTTCCAGC CCCAACCATG CATAAAAGGG GTTCGCGGAT CTCGGAG 47 
(2) INFORMATION FOR SEQ ID NO: 241: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alpha-MHC gene for myosin 
heavy chain (N-terminus) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:241: 
AGAGGGTGGG GGAAACGGGA TATAAAGGAA CTGGAGCTTT GAGGAG 46 
(2) INFORMATION FOR SEQ ID NO: 242 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human class II invariant gamma-chain 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 242: 
GATTCCTCTC CAGCACCGAC TTTAAGAGGC GAGCCGGGGG GTCAG 
(2) INFORMATION FOR SEQ ID NO: 243: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human motilin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 243: 
CCCAGGGTTG GGAGGTATAT AAGAACCCGT CAGATCAGCC G 
(2) INFORMATION FOR SEQ ID NO: 244: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human myeloperoxidase gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 244: 
CCACCCCCAG CTTAGAGGAC ATAAAAGCGC AGATTGAGCT AAGAGGAGCT 
(2) INFORMATION FOR SEQ ID NO: 245: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human mitochondrial RNA-processing 
endoribonuc lease RNA gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 245: 
AAACACAATT TCTTTAGGGC TATAAAATAC TACTCTGTGA AGCTGAGGA 
(2) INFORMATION FOR SEQ ID NO: 24 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human myc-oncogene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 246: 
GAGGGAGGGA TCGCGCTGAG TATAAAAGCC GGTTTTCGGG GCTTTAT 
(2) INFORMATION FOR SEQ ID NO; 247: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human Na , K-ATPase beta subunit 
(ATP1B) gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:247: 
GCACGGCCGC CGGGGCGCGG TATATAGTAA AGGTAGOGCG GGCGCA 
(2) INFORMATION FOR SEQ ID NO: 248: 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human neuromedin K receptor 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 248: 
GAAGCGTGGG ACCCCATGAG TATAAAGAGA GCCTGTAGCG CAGG 
(2) INFORMATION FOR SEQ ID NO: 249: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for heavy neurof iliment 
subunit (NF-H) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 249: 
TTGGACCCGG CCGCGGCGGC TATAAAAGGG CCGGCGCCCT GGTCGT 
(2) INFORMATION FOR SEQ ID NO: 250: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B ) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human nuclear factor NF-IL6 gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 250: 
CGGTTGCTAC GGGCCGCCCT TATAAATAAC CGGGCTCAGG AGAAACT 
(2) INFORMATION FOR SEQ ID NO: 251: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNES5: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human neurofilament subunit NF-L 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 251: 
TGCGTCAGGA CCTCCCGGCG TATAAATAGG GGTGGCAGAA CGGCGC 
(2) INFORMATION FOR SEQ ID NO:252: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) . TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human neurokinin-2 receptor (NK-2) 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 252 : 

r 

TCTCTTCAGC GAAGGGGTTG ATTTATAAGG GTGTTTTCTG CTCTGACA 
(2) INFORMATION FOR SEQ ID NO: 253: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human n-myc gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 253: 
GGGTGTGTCA GATTTTTCAG TTAATAATAT CCCCCGAGCT TCAAAGCGC 
(2) INFORMATION FOR SEQ ID NO: 254: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH; 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE : 

(C) INDIVIDUAL ISOLATE: Human ornithine decarboxylase (ODC1) 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 254: 
CCATGGCGAC CCGCCGGTGC TATAAGTAGG GAGCGGCGTG CCGTGG 
(2) INFORMATION FOR SEQ ID NO: 255: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ornithine transcarbamylase 
(OTC) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 255: 
ATACACAGCG GTGGAGCTTG GCATAAAGTT CAAATGCTCC TACACC 
(2) INFORMATION FOR SEQ ID NO: 256: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human prepro-oxytocin-neurophysin 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:256: 
CTCCACCGAC GCAATGCCCA GGCATAAAAA GGCCAGGCCG AGAGACCGCC 
(2) INFORMATION FOR SEQ ID NO: 257: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE : 

(C) INDIVIDUAL ISOLATE: Human cytochrome P450scc gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:257: 
TATGGCCTTG AGCTGGTAGT TATAATCTTG GCCCTGGTGG CCCAG 
(2) INFORMATION FOR SEQ ID NO: 258: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human p53 gene for transmembrane 
related p53 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 258: 
CCCCTCCCAT GTGCTCAAGA CTGGCGCTAA AAGTTTTGAG CTTCTCAAAA 
(2) INFORMATION FOR SEQ ID NO: 2 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human Alzheimer's disease amyloid A4 
precursor gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 259: 
GGGAGGCCTG CGGGGTCGGA TGATTCAAGC TCACGGGGAC GAGCAGG 
(2) INFORMATION FOR SEQ ID NO: 2 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human Alzheimer's disease amyloid A4 
precursor gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 260: 
CGGGGACGAG CAGGAGCGCT CTCGACTTTT CTAGAGCCTC AGCGTCCTAG GACT 
(2) INFORMATION FOR SEQ ID NO: 261: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human Alzheimer's disease amyloid A4 
precursor 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 261: 
GCGGGGTGGG CCGGATCAGC TGACTCGCCT GGCTCTGAGC CCCGCCGC 
(2) INFORMATION FOR SEQ ID NO: 262: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human Alzheimer's disease amyloid A4 
precursor gene 



WO 94/14980 



PCT/US93/12388 



343 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 262: 
CCGCCGCCGC GCTCGGGCTC CGTCAGTTTC CTCGCCAGCG GTAGGCGAG 
(2) INFORMATION FOR SEQ ID NO:263: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for plasminogen activator 
inhibitor 1 (PAI-1) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 263: 
TATTTCCTGC CCACATCTGG TATAAAAGGA GGCAGTGGCC CACAGAG 
(2) INFORMATION FOR SEQ ID NO: 264: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human platelet-derived growth factor 
A- chain (PDGF) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 64: 
AGGGGCGCGG CGG CGGCGGC TATAACCCTC TCCCCGCCGC CGGCC 
(2) INFORMATION FOR SEQ ID NO: 265: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human.PGP9.5 gene for 
neuron-specific ubiquitin 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 265: 
ACAGTGCGTC TGG CCGGCGC TTTATAGCTG CAG CCTGGCG CTCCGC 
(2) INFORMATION FOR SEQ ID NO: 266: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

■ •'■ •'■'■•■ : 'iyi )"'" ORIGINAL SOURCE : 

(C) INDIVIDUAL ISOLATE: Human plasminogen gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 266: 



CTCCACCGAC GCAATGCCCA GGCATAAAAA GGCCAGGCCG AGAGACCGCC 



50 



(2) INFORMATION FOR SEQ ID NO; 2 67: 



(i) 



SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 49 base pairs 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human phenyl ethanol amine N-methylase 

(PNMT) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 267: 
TCGGGGCGGG GGTCGGGCGG TAGAAAAAAG GGCCGCGAGG CGAGCGGGG 49 
(2) INFORMATION FOR SEQ ID NO: 268: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human opiomelanocortin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO;268: 
CTCCCCGTGT GCAGACGGTG ATATTTACCG CCAAATGCGA ACCAGGC 
(2) INFORMATION FOR SEQ ID NO: 269: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

i 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene PRB3L for proline-rich 
protein Gl 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:269: 
GCCACTGTTC TGCTCCTCTT TATAAAGGGA GCTGCCATGG TTCTCC 
, <2) INFORMATION FOR SEQ ID NO: 270: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

( i i ) MOLECULE TYPE : DNA ( genomi c ) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human PRB4 gene for proline-rich 
protein Po 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:270: 
CATTGTTTTG CTCCTCTTTA TAAAGGGAGT TGCCACGTTC CTCC 
(2) INFORMATION FOR SEQ ID NO: 271: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human prolactin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:271: 
AGG CTTTG AT ATCAAAGGTT TATAAAGCCA ATATCTGGGA AAGAGA 
(2) INFORMATION FOR SEQ ID NO: 272: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE; 
(C) INDIVIDUAL ISOLATE: Human prothymosin-alpha gene 



WO 94/14980 



PCT/US93/12388 



348 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:272: 
CCGAGCGCCG CCCACTAATC TATATTAAAG CTTCTGGCGC CGCGTG 46 
(2) INFORMATION FOR SEQ ID NO: 273: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL : NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human protamine 2 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 273: 
TCATAGTGGG CGTCCCCCTT TATATACAAG CTCCCGGGGA GCCTTG 46 
(2) INFORMATION FOR SEQ ID NO: 274: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human SPR2-1 gene for small proline 
rich protein 
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(xi) SEQUENCE DESCRIPTION t SEQ ID NO:274: 
CTGGGTGGGG TAGCAGGCTC TATAAAGAGA TCCTCTGCTG CACGAC 
(2) INFORMATION FOR SEQ ID NO: 275: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human estrogen-responsive gene pS2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 275: 
TAAGCAAACA GAGCCTGCCC TATAAAATCC GGGGCTCGGG CGGCCTC 
(2) INFORMATION FOR SEQ ID NO:276: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human pulmonary surfactant 
apoprotein (PSAP) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 276: 
AGCCTG6CAG CCCCCACATC TATAAATGCT GCGTCTACCT TACCCT 
(2) INFORMATION FOR SEQ ID NO: 277 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for prostatic secretory 
protein PSP-94 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:277: 
TGCGTGGTTG CCCTCTCCAG TATAAAAGTT TGATGCAGCT TTTCC 
(2) INFORMATION FOR SEQ ID NO: 278: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL i NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human parathyroid hormone-related 
peptide (PTHRP) gene 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO; 278: 
GAGGTAGACA GACAGCTATG TATATATATG TGGGTTTCGC TACAAGTGG 
(2) INFORMATION FOR SEQ ID NO; 279: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

<C) INDIVIDUAL ISOLATE: Human gene for purine nucleoside 
phosphorylase 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:279: 
CTGGGGACTC CAGGGCAAGG GATATAAGCC AGAGCCTAGA CCAGTG 
(2) INFORMATION FOR SEQ ID NO: 280: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human rDNA 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 280: 
ATTTTGGGCC GCCGGGTTAT 
(2) INFORMATION FOR SEQ ID NO: 281; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human regenerating protein (reg) 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:281: 
GTTCTTATCT CAGATCCTGA TATAAAGCTC CTACAGCTAC CTGGCC 
(2) INFORMATION FOR SEQ ID NO: 282 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human renin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 282: 
ATCACCCCAT GCATGGAGTG TATAAAAGGG GAAGGGCTAA GGGAGCC 47 
(2) INFORMATION FOR SEQ ID NO: 283: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene fragment for retinol 
binding protein (RBP) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 283: 
CGACCCCCTC CCCCCGGCGC TATAAAGCAG CGGGGCGGCC GCGGCG 46 
(2) INFORMATION FOR SEQ ID NO: 2 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human serum amyloid A (GSAAl) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 284: 
CACCCCGCTA ATTTAAAAAA TATATATACA GATATATAGT GGAGATGG 
(2) INFORMATION FOR SEQ ID NO: 285: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human SAA1 beta gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 285: 
AACCAGCAGG GAAGGCTCAG TATAAATAGC AGCCACCGCT CCCTGGC 
(2) INFORMATION FOR SEQ ID NO: 286: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 
'■■„-...''"''"'-"■"•■" (C) INDIVIDUAL ISOLATE: Human gene fragment for HLA class 

SB 4-beta chain 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 286: 

CTACTTGGGT TCATGGTCTC TAATATTTCA AACAGGAGCT CCCTTTAG 48 

(2) INFORMATION FOR SEQ ID NO: 287: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human c-sis proto-oncogene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 287: 
TCGCACTCTC CCTTCTCCTT TATAAAGGCC GGAACAGCTG AAAGGG 46 
(2) INFORMATION FOR SEQ ID NO: 288: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human SLPI gene for secretory 
leukocyte protease inhibitor 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 288: 
CACACCCACT GGTGAAAGAA TAAATAGTGA GGTTTGGCAT TGGCCA 
(2) INFORMATION FOR SEQ ID NO: 289: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double, 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human, superoxide dismutase (SOD-1) 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 269 : 
CGAGGCGCGG AGGTCTGGCC TATAAAGTAG TCGCGGAGAC GGGGTG 
<2) INFORMATION FOR SEQ ID NO: 290: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ornithine decarboxylase gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 290: 
TCCATGGCGA CCCGCCGGTG CTATAAGTAG GGAGCGGCGT GCCGT 
(2) INFORMATION FOR SEQ ID NO: 291s 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNES5 : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human steroid 5-alpha-reductase gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:291: 
CTGCCCCCGC GCCGCCGCCC TATATGTTGC CCGCCGCGGC CTCTG 
(2) INFORMATION FOR SEQ ID NO:292: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNE5S: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

"'"(C) INDIVIDUAL ISOLATE: Human substance P receptor gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:292: 
GTGACGTCTC TGCAGGGGGT TATAAAAGCC TCGTGCGCAG CTAA 
(2) INFORMATION FOR SEQ ID NO: 293: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human synaptobrevin 1 (SYB1) gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:293: 
CCGGGAGGCG TGGTCAGCAC TAATAAAGGC GGAGGCCGGC GCGGCA 
(2) INFORMATION FOR SEQ ID NO: 294: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human tyrosine aminotransferase 
( TAT ) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 294: 
CAACGCCCAT TTGTGGAGAC TATTTCAGGA GTTAGGATTT GCATCTG 
(2) INFORMATION FOR SEQ ID NO: 295: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human. T-cell receptor V-beta gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 295: 
GACAGATGCA TTCTGTGGGG ATAAAATGTC ACAAAATTCA TTTCTTT 
(2) INFORMATION FOR SEQ ID NO: 296: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human T-cell receptor V-beta gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 296: 
TCACAGAGGG CCTGGTCTAG AATATTCCAC ATCTGCTCTC ACTCT 
(2) INFORMATION FOR SEQ ID NO:297: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human T-cell receptor V-beta gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 297: 
GACAGATGCA TTCTGTGGGG ATAAAATGTC ACAAAATTCA TTTCTTT 
(2) INFORMATION FOR SEQ ID NO: 298: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human T-cell receptor V-beta gene 



WO 94/14980 



PCT/US93/12388 



361 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 298: 
TCACAGAGGG CCTGGTCTGG AATATTCCAC ATCTGCTCTC ACTCTG 
(2) INFORMATION FOR SEQ ID NO:299: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human T-cell receptor V-beta chain 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:299: 
TGTTACTGTA GGAACTACCG TATAAGGACA GGATGTCCCA CCTCC 
(2) INFORMATION FOR SEQ ID NO: 300: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human transferrin (Tf ) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:300: 
CCGCCCAGGC CGGGAATGGA ATAAAGGGAC GCGGGGCGCC GGAGG 
(2) INFORMATION FOR SEQ ID NO: 301: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interleukin 3 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 301: 
GGGCACCTTG 

(2) INFORMATION FOR SEQ ID NO: 302: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human tissue factor gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 302 : 
CGGGAGAGCG CGCCGCCGGC CCTTTATAGC GCGCGGGGCA CCGGCTCCCC 
(2) INFORMATION FOR SEQ ID NO: 303: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human transforming growth 
factor-beta (TGF-beta) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 303: 
TGCCTTGCCC ATGGGGGCTG TATTTAAGGA CACCGTGCCC CAAGCCC 
(2) INFORMATION FOR SEQ ID NO: 304: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human transforming growth factor 
beta-3 gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 304: 
GAGACGTCAT GGGAGGGAGG TATAAAATTT CAGCAGAGAG AAATAGA 
(2) INFORMATION FOR SEQ ID NO: 305: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human transforming growth factor 
bet a- 2 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:305: 
CACGTGGTTC AGAGAGAACT TATAAATCTC CCCTCCCCGC GAAGA 
(2) INFORMATION FOR SEQ ID NO: 306: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human tyrosine hydroxylase (TH) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 306: 
GGCTTTGACG TCAGCTCAGC TTATAAGAGG CTGCTGGGCC AGGGCT 
(2) INFORMATION FOR SEQ ID NO: 307: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base paivs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human metallothionein gene IIA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 307: 
TCGTCCCGGC TCTTTCTAGC TATAAACACT GCTTGCCGCG CTGCAC 
(2) INFORMATION FOR SEQ ID NO: 305 - 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

( ii ) MOLECULE TYPE : DNA { genomic ) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human thrombospondin gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 308: 
CCCAGGAATG CGAGCGCCCC TTTAAAAGCG CGCGGCTCCT CCGCCT 
(2) INFORMATION FOR SEQ ID NO: 309: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human thyroxine-binding globulin 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 309: 
ATAATGTTGC TATAACATCT GAATGACAGT CCATGGCATT STTTC 
(2) INFORMATION FOR SEQ ID NO: 310: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human thyroglobulin gene 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 310: 
GAAAGTGCCA ACGGCAGCTC TATAAAAGCT CCCTGGCCAG GGGACCT 
(2) INFORMATION FOR SEQ ID NO: 311: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for tumor necrosis factor 
(TNF-alpha) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 311: 
CTCCTCTCGC CCCAGGGACA TATAAAGGCA GTTGTTGGCA CACCCA 
(2) INFORMATION FOR SEQ ID NO: 312: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human lymphotoxin (TNF-beta) gene 
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(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 312: 
GCTGCCACTG CCGCTTCCTC TATAAAGGGA CCTGAGCGTC CGGGCC 
(2) INFORMATION FOR SEQ ID NO: 313: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human type I DNA topoisomerase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 313: 
TGACGTCGCC GACGTGTTGT TTAAAAGCGG ^gpGCGCAGGC GCAGTGAGCC 
(2) INFORMATION FOR SEQ ID NO:314: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) *T 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human triosephosphate isomerase 
(TPI) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:314: 
AGTTCCACTT CGCGGCGCTC TATATAAGTG GGCAGTGGCC GGACTGC 
(2) INFORMATION FOR SEQ ID NO: 315: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human, thyroid peroxidase gene 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 315: 
ATCCAAGCGC AGAGTCAGTT TATAAGGTGG ^3TAACCAAGT CCCT 
(2) INFORMATION FOR SEQ ID NO: 316 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human transferrin receptor gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:316: 
GGCCGGGGGC GGGGCCAGGC TATAAACCGC CGGTTAGGGG CCGCCA 
(2) INFORMATION FOR SEQ ID NO: 317: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human tryptase -I ge 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 317: 
CGCCCCCTCC TGATCTGGAA GGATAAATGG GGAGGGGAGA GCCACTGGGT 
(2) INFORMATION FOR SEQ ID NO:318: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human beta 2 gene for beta-tubulin 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:318: 
GCGGAGGCGG GCAGGGAGGG TATATAAGCG TTGGCGGACG GTCGGT 
(2) INFORMATION FOR SEQ ID NO: 319: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for U 6 RNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 319: 
GTATTTCGAT TTCTTGGCTT TATATATCTT GTGGAAAGGA CGAAAC 
(2) I NFOPMAT I ON FOR SEQ ID NO: 320: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human uPA gene for 
urokinase-plasminogen activator 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO;320: 
GGCGGCGCCG GGGCGGGCCC TGATATAGAG CAGGCGCCGC GGGTCGC 
(2) INFORMATION FOR SEQ ID NO: 321: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human proto-oncogene vav 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO:321: 
GCAGGCGTGC GGGCGGGTGG GTGGTGGAGG CTGCGA 
(2) INFORMATION FOR SEQ ID NO: 322: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human vascular cell adhesion 

molecule- 1 (VCAM1) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 322: 
GCCTCTGCAA CAAGACCCTT TATAAAGCAC AGACTTTCTA TTTCA 
(2) INFORMATION FOR SEQ ID NO: 323: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA ( genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human vimentin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:323: 
ACCCTCTTTC CTAACGGGGT TATAAAAACA GCGCCCTCGG CGGGG 
(2) INFORMATION FOR SEQ ID NO: 32*:: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human Ul RNA gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:324: 
GTAAAGGGTG AGGTATATGG AGCTGTGACA GGGCAGAAGT GTGTGAAGTC 
(2) INFORMATION FOR SEQ ID NO: 325: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for small nuclear Ul RNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 325: 
GTAAAGAGTG AGGCGTATGA GGCTGTGTCG GGGCAGAGGC CCAAGATCTC 
(2) INFORMATION FOR SEQ ID NO: 326: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human small nuclear U2 RNA gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 326 : 
TTGAATGTGG ATGAGAGTGG GACGGTGACG GCGGGCGCGA AGGCGAGCGC 
(2) INFORMATION FOR SEQ ID NO: 327: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human. U3 small nuclear RNA gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 327: 
AAAAGTTTGC GGCAGATGTA GACCTAGCAG AGGTGTGCGA GGAGGCCGTT 
(2) I NFORKATION FOR SEQ ID NO: 328: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 

""'■(C)'" INDIVIDUAL ISOLATE: Human U4C small nuclear RNA gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:328: 
AAATGGTAGT CATCATCCGT GGGGGAGCGG GGCGCGAATA AAGCCTTTCC 
(2) INFORMATION FOR SEQ ID NO: 329: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human histone H3.3 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:329: 
GGGCGGGGCG GCGTGTGTTG GGGGATAGCC TCGGTGTCAG CCATCTTTCA 
(2) INFORMATION FOR SEQ ID NO: 330: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human histone H4 gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 330: 
AGTTCGGTCC GCCAACTGTC GTATAAAGGC GCTGCCTCAG GTCAGAGGCC 
(2) INFORMATION FOR SEQ ID NO: 331: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human non-histone chromosomal 
protein HMG-14 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 331: 
TGGGGGGCGG CCCGGCCGGC GGGGAGGGGG AGCCGCGGCC GGGACGCGGG 
(2) INFORMATION FOR SEQ ID NO: 332: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ribosomal protein S14 gene 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 332: 
AAGTAATAAA CCGTCTTTCC TTATGACGAG TCTTAAACTC TTTGGGAGGA 
{2) INFORMATION FOR SEQ ID NO: 333: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for alpha tubulin (b 
alpha 1) 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:333: 
CGCGACCGAG GGTCTGGGCG TCCCGGCTGG GCCCCGTGTC TGTGCGCACG 
(2) INFORMATION FOR SEQ ID NO: 334 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human skeletal alpha-act in gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 334: 
AGGGAATCGC CCGCGGGCTA TATAAAACCT GAGCAGAGGG ACAAGCGGCC 
(2) INFORMATION FOR SEQ ID NO:335: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human epidermal 67-kDa Keratin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 335: 
GGAAGATCTT GTGTGATAAA ACAATTACCA CATGAACCAA TCTTGCATGC 
(2) INFORMATION FOR SEQ ID NO: 336: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human 50 KDatype I epidermal keratin 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 336: 
GACCCGCCCC CTACCCATGA GTATAAAGCA CTCGCATCCC TTTGCAATTT 
(2) INFORMATION FOR SEQ ID NO: 337 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human, alpha-1 collagen type I gene 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:337: 
CTGCTCTCCA TCAGGACAGT ATAAAAGGGG CCCGGGCCAG TCGTCGGAGC 
(2) INFORMATION FOR SEQ ID NO: 338: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human collagen type-Ill gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:338: 
GTGAG6GAAG CCAAACTTTT TCCTATTTAA GGCCAAAGCA AAGGAATCTC 
(2) INFORMATION FOR SEQ ID NO: 339: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human pro-alpha-2 (I) mRNA for 
collagen N-prepropeptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 339: 
CAGGGAAACT TTTGCCGTAT AAATAGGGCA CATCCGGGAT TTGTTATTTT 
(2) INFORMATION FOR SEQ ID NO: 340: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(Vi) ORIGINAL SOURCE : 

(C) INDIVIDUAL ISOLATE: Human fibronectin (FN) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 340: 
TCCAGAGGGG CGGGAGGGCC GTCCCATATA AGCCCGGCTC CCGCGCTCCG 
(2) INFORMATION FOR SEQ ID NO: 341: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 ba9e pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human von Willebrand factor gene 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 341: 
TGTTTCCTTT TGGTAATTAA AAGGAGGCCA ATCCCCTGTT GTGGCAGCTC 
(2) INFORMATION FOR SEQ ID NO: 342: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human gene for fibrinogen gamma 
chain 



(vi) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 342: 
TGGCTATCCC AGGAGCTTAC ATAAAGGGAC AATTGGAGCC TGAGA 
(2) INFORMATION FOR SEQ ID NO: 343: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for fibrinogen gamma 
chain 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:343: 
CAGTCCTGGC TATCCCAGGA GCTTACATAA AGGGACAATT GGAGCCTGAG 
(2) INFORMATION FOR SEQ ID NO: 344 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human involucrin mRNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 344: 

AGGCCAGGCT GCAGAATGAT ATAAAGAGTG CCCTGACTCC TGCTCAGCTC 

(2) INFORMATION FOR SEQ ID NO:345 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 50 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apo lipoprotein A-I and C-III 
genes 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 345 : 
CCAGACCCTG GCTGCAGACA TAAATAGGCC CTGCAAGAGC TGGCTGCTTA 
(2) INFORMATION FOR SEQ ID NO:346: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human apo lipoprotein B-100 (apoB) 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 346: 



GCTCTTGCAG CCTGGGCTTC CTATAAATGG GGTGCGGGCG CCGGCCGCGC 



50 



(2) INFORMATION FOR SEQ ID NO: 347: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL : NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human apo lipoprotein A-I and C-III 
genes 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 347: 
TCTAGGGATG AACTGAGCAG 20 
(2) INFORMATION FOR SEQ ID NO: 348: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



:i ■-*,..;'■"-. *■'■■■■■ •■ : " , "'""- : iyi> 



ORIGINAL SOURCE: 



(C) INDIVIDUAL ISOLATE: Humanapo lipoprotein A-I and C-III 



genes 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:348: 
ACAGGCAGGA GGGTTCXGAC CTGTTTTATA TCATCTCCAG GGCAGCAGGC A 
(2) INFORMATION FOR SEQ ID NO: 349: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

{iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human albumin gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 349: 
TACAATTATT GGTTAAAGAA GTATATTAGT GCTAATTTCC CTCCGTTTGT 
(2) INFORMATION FOR SEQ ID NO: 350: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human albumin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 350: 
TACAATTATT GGTTAAAGAA GTATATTAGT GCTAATTTCC CTCCGTTTGT C 
(2) INFORMATION FOR SEQ ID NO: 351 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE i nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human serum prealbumin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:351: 
CCTAGCTCAG GAGAAGTGAG TATAAAAGCC CCAGGCTGGG AGCAGCCATC 
(2) INFORMATION FOR SEQ ID NO: 352: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alpha-f etoprotein (AFP) gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 352: 
TAACAGGCAT TGCCTGAAAA GAGTATAAAA GAATTTCAGC ATGATTTTCC 
(2) INFORMATION FOR SEQ ID NO: 353: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human . C-reactive protein gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 353: 
AGGCAGGAGG AGGTAGCTCT AAGGCAAGAG ATCTGGGACT T 
(2) INFORMATION FOR SEQ ID NO: 354: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene A for alpha 1-acid 
glycoprotein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:354: 
AAGTGACCGC CCATAGTTTA TTATAAAGGT GACTGCACCC TGCAGCCACC 
(2) INFORMATION FOR SEQ ID NO: 355: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene A for alpha 1-acid 
glycoprotein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 355: 
AAGTGACCGC CCATAGTTTA TTATAAAGGT GACTGCACCC TGCAGCCACC A 
(2) INFORMATION FOR SEQ ID NO: 356: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for L apoferritin 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 356 : 
CGGCGCACCA TAAAAGAAGC CGCCCTAGCC ACGTCCCCTC 
(2) INFORMATION FOR SEQ ID NO: 357: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for L apoferritin 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:357: 
CGGCGCACCA TAAAAGAAGC CGCCCTAGCC ACGTCCCCTC G 
(2) INFORMATION FOR SEQ ID NO: 358: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Olive baboon alpha-1 globin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 358: 
GGCGTGCCCC CGCGCCCGGA GCATAAACCC TGGCGCGCTC GCGGCCCGGC 
(2) INFORMATION FOR SEQ ID NO: 359: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Olive baboon alpha-1 globin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 359: 
GGCGTGCCCC CGCGCCCGGA GCATAAACCC TGGCGCGCTC GCGGCCCGGC A 
(2) INFORMATION FOR SEQ ID NO: 360: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alpha-globin germ line gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 360: 
GTGCCAACAA TGGAGGTGTT TACCTGTCTC AGACCAAGGA CCTCTCTGCA 
(2) INFORMATION FOR SEQ ID NO: 361: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Chimpanzee gene for alpha- like 
zeta-l-globin 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 361: 
CCTGGCTGGG CCCAGCTCCC TGTATATAAG GGGACCCTGG GGGCTGAGCA 
(2) INFORMATION FOR SEQ ID NO: 362: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Chimpanzee gene for alpha-like 
zeta-l-globin 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 362: 
CCTGGCTGGG CCCAGCTCCC TGTATATAAG GGGACCCTGG GGGCTGAGCA C 
(2) INFORMATION FOR SEQ ID NO: 363: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alpha globin gene cluster 
chromosome 16: zeta gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 363: 
CTGGCTGGGC CCAGCTCCCT GTATATAAGG GGACCCTGGG GGCTGAGCAC 
(2) INFORMATION FOR SEQ ID NO: 364: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human theta 1-globin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 364: 
CCGCGGGACC CCTGGCCGGT CCGCGCAGGC GCAGCGGGGT CGCAGGGCGC 
(2) INFORMATION FOR SEQ ID NO: 365: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Macaque cynomolgus beta-globin gene 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 365: 
GCAGG AG CCA GGGCTGGGCA TAAAAGTCAG GGCAGAGCCA TCTATTGCTT 
(2) INFORMATION FOR SEQ ID NO: 366: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Chimpanzee beta-globin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 366: 
GCAGAAGCCA GGGCTGGGCA TAAAAGTCAG GGCAGAGCCA TCTATTGCTT 50 
(2) INFORMATION FOR SEQ ID NO: 367: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human germ line gene for beta-glob in 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 367: 
GCAGGAGCCA GGGCTGGGCA TAAAAGTCAG GGCAGAGCCA TCTATTGCTT 50 
(2) INFORMATION FOR SEQ ID NO: 368: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Spider monkey (A.geof f royi) 
delta-globin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:368: 

CAGGGAGAAC AGGACCAGCA TAAAAGGCAG GGCAGGGCTA ACTGTTGCTT 

(2) INFORMATION FOR SEQ ID NO: 369: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 50 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human transferrin receptor gene 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 369: 
GGGCGGGGCC AGGCTATAAA CCGCCGGTTA GGGGCCGCCA TCCCCTCAGA 
(2) INFORMATION FOR SEQ ID NO: 370: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE : 

(C) INDIVIDUAL ISOLATE: Human beta-2 -adrenergic receptor 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:370: 
AGTTCCCCTA AAGTCCTGTG CACATAACGG GCAGAACGCA CTGCGAAGCG 
(2) INFORMATION FOR SEQ ID NO: 371: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human IgE receptor gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 371: 
GGTGGCAAGC CCATATTTAG GTCTATGAAA ATAGAAGCTG TCAGTGGCTC 
(2) INFORMATION FOR SEQ ID NO: 372: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human oncogene c-fos 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 372: 
TTCATAAAAC GCTTGTTATA AAAGCAGTGG CTGCGGCGCC TCGTACTCCA 
(2) INFORMATION FOR SEQ ID NO:373: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human c-myc oncogene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 373: 
AATCTCCGCC CACCGGCCCT TTATAATGCG AGGGTCTGGA CGGCTGAGGA 
(2) INFORMATION FOR SEQ ID NO: 374: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human B-cell leukemia/ lymphoma 2 
(bcl-2) proto-oncogene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 374: 
CCGCCCCTCC GCGCCGCCTG CCCGCCCGCC CGCCGCGCTC CCGCCCGCCG 
(2) INFORMATION FOR SEQ ID NO: 375: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA ( genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human p53 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:375: 
ACTCCATTTC CTTTGCTTCC TCCGGCAGGC GGATTACTTG CCCTTACTTG 
(2) INFORMATION FOR SEQ ID NO: 37 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene homologous to bladder 
carcinoma oncogene T24 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 376: 
CGCGGCCCTA CTGGCTCCGC CTCCCGCGTT GCTCCCGGAA GCCCCGCCCG 
(2) INFORMATION FOR SEQ ID NO: 377: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human. c-abl gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:377: 

GGGGCGGGCC TGGCGGGCGC CCTCTCCGGG CCCTTTGTTA ACAGGCGCGT 

(2) INFORMATION FOR SEQ ID NO: 378: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 50 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human metallothionein~i-a gene 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO:378: 
CGGCCCTCTT TCCCCTGACC ATAAAAGCAG CCGCTGGCTG CTGGGCCCTA 
(2) INFORMATION FOR SEQ ID NO: 379: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human metallothinonein I-B gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 379: 
ACCCCACCAC CTCCCCCGAC TATAAAGGAG CAGCCAGCTC CTGGGCTCCA 
(2) INFORMATION FOR SEQ ID NO: 380: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human metallothionein-If (MT-IF) 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:380: 
CCCGGCCCCC TCCCCTGACT ATCAAAGCAG CGGCCGGCTG TTTGGGTCCA 
(2) INFORMATION FOR SEQ ID NO: 381: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human, gene for 27 Kda heat shock 
protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 381: 
CCCTCAAACG GGTCATTGCC ATTAATAGAG ACCTCAAACA CCGCCTGCTA 
(2) INFORMATION FOR SEQ ID NO: 382: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human 70 kDa heat shock protein gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:382: 

GCGGGTCTCC GTGACGACTT ATAAAACCCC AGGGGCAAGC GGTCCGGATA 

(2) INFORMATION FOR SEQ ID NO: 383: 

(i) SEQUENCE CHARACTERISTICS ; 

(A) LENGTH: 50 base pairs 
(BJ TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human macrophage alphal -antitrypsin 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 383: 
TGCCTCCACC CGAAGTCTAC TTCCTGGGTG GGCAGGAACT GGGCACTGTG 
(2) INFORMATION FOR SEQ ID NO: 384 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vif 



ORIGINAL SOURCE : 
(C) INDIVIDUAL ISOLATE: Human alphal -antitrypsin (S variant) 
gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 384: 
CGTTGCCCCT CTGGATCCAC TGCTTAAATA CGGACGAGGA CAGGGCCCTG 
(2) INFORMATION FOR SEQ ID NO:385: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human S variable segment 5 'of 
antithrombin III gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 385: 
TCTGCCCCAC CCTGTCCTCT GGAACCTCTG CGAGATTTAG AGGAAAGAAC 
(2) INFORMATION FOR SEQ ID NO: 386: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human pulmonary surfactant protein 
(SP5) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 386: 
CCCCTCTCCC TACGGACACA TATAAGACCC TGGTCACACC TGGGAGAGGA 
(2) INFORMATION FOR SEQ ID NO:387: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human Immunoglobulin kappa L-chain V 
region gene (HX122) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 87: 
CCCCCTGCCC TGAAGACTTT TTATAGGCTG GTCACACCCG GAGCAGGAGT 
(2) INFORMATION FOR SEQ ID NO: 388: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human T cell receptor 
V-alpha/J-alpha chain (rearranged) 



WO 94/14980 



PCT/US93/12388 



406 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:388: 
TTAAGGTTTG AATCCTCAGT GAACCAGGGC AGAAAAGAAT GATGAAATCC 
(2) INFORMATION FOR SEQ ID NO: 389: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for HLA-DR alpha heavy 
chain (class II antigen) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:389: 
TGCATTTTAA TGGTCAGACT CTATTACACC CCACATTCTC TTTTCTTTTA 
(2) INFORMATION FOR SEQ ID NO: 390: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human MHC class IIHLA-DC-3-beta gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:390: 
CTACCACGCA TGGAAACATC CACAGATTTT TATTCTTTCT GCCAGGTACA 
(2) INFORMATION FOR SEQ ID NO: 3 91: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human T-ce 11 receptor CD3-gamma gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 391: 
GCCTTCTCTC AAAGGCCCCA GCCCCAACAG TGATGGGTGG AGCCAGTCTA 
(2) INFORMATION FOR SEQ ID NO: 3 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human pregnancy-specific beta-1 
glycoprotein mRNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 392: 
CTGCCCTGGG AAGAGGCTCA GCACAGAAAG AGGAAGGACA GCACAGCTGA 
(2) INFORMATION FOR SEQ ID NO: 393: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human pregnancy-specific 
beta-l-glycoprotein 5 (PSG5) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 393: 
AGAGAGGAGG GGACAGAGAG GTGTCCTGGG CCTGACCCCA CCCATGAGCC 
(2) INFORMATION FOR SEQ ID NO: 394: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
* (iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human factor VIII gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 394: 
CCTGTGGCTG CTTCCCACTG ATAAAAAGGA AGCAATCCTA TCGGTTACTG 
(2) INFORMATION FOR SEQ ID NO: 395: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ubiquitin gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:395: 

TGACGCAACA CTCGTTGCAT AAATTTGCCT CCGCCAGCCC GGAGCATTTA 

(2) INFORMATION FOR SEQ ID NO:396: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 50 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human proliferating cell nucleolar 
protein P120 gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 396: 
ACTATAATAC GCCAAGCGTG CGTTCTGCCG TTCCCTCCGA CACGCGCGAC 
(2) INFORMATION FOR SEQ ID NO: 397: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for delta-globin 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:397: 
CAGGGAGGAC AGGACCAGCA TAAAAGGCAG GGCAGAGTCG ACTGTTGCTT 
(2) INFORMATION FOR SEQ ID NO: 398: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Gorilla fetal A-gamtna-globin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 398: 
CGGCTGGCTA GGGATGAAGA ATAAAAGGAA GCACCCTCCA GCAGTTCCAC 
(2) INFORMATION FOR SEQ ID NO: 399: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for fetal A-gamma and 
G-gamraa hemoglobin 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 399: 
CGGCTGGCTA GGGATGAAGA ATAAAAGGAA GCACCCTTCA GCAGTTCCAC 
(2) INFORMATION FOR SEQ ID NO: 400: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

( ii ) MOLECULE TYPE : DNA ( genomic ) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Orangutan epsilon-globin gene with 
flanking Alu repeats 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 400: 
CAGAACTTCG GCAGTGAAGA ATAAAAGGCC ACACAGAGAG GCAGCAGCAC 
(2) INFORMATION FOR SEQ ID NO: 401: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

<C) INDIVIDUAL ISOLATE: Human haptoglobin (Hpl) 
gene 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 401: 
TAAAAAGACC AGCAGATGCC CCACAGCACT GCTCTTCCAG AGGCAAGACC 
(2) INFORMATION FOR SEQ ID NO: 402: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 71 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human low molecular weight 
oligoadenylate synthetase gene 
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(XX) SEQUENCE DESCRIPTION; SEQ ID NO: 402: 
AAGACAGCTC CTCCCTTCTG AGGAAACGAA ACCAACAGCA GTCCAAGCTC AGTCAGCAGA 
AGAGATAAAA G 

(2) INFORMATION FOR SEQ ID NO: 403 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear, 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene fragment for 

dihydrofolate reductase 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 403: 
GGGGGGCGGG GCCTCGCCTG CACAAATAGG GACGAGGGGG CGGGGCGGCC 
(2) INFORMATION FOR SEQ ID NO: 404: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 
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(C) INDIVIDUAL ISOLATE: Human thymidine kinase gene 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 404: 
GGCTCGTGAT TGGCCAGCAC GCCGTGGTTT AAAGCGGTCG GCGCGGGACC 
(2) INFORMATION FOR SEQ ID NO: 405: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear , 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human adenosine deaminase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 405: 
GCGGGAGGCG GGGCCCGGCC CGTTAAGAAG AGCGTGGCCG GCCGCGGCC 
(2) INFORMATION FOR SEQ ID NO: 406: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human argininosuccinate synthase 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 406: 

TGCCCCCGGG CCCTGTGCTT ATAACCTGGG ATGGGCACCC CTGCCAGTCC 

(2) INFORMATION FOR SEQ ID NO: 407: 

(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ornithine aminotransferase 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 407: 
GGGGGCGGGG CAGAATCAGC CTTTAAGTTG CAGTGACGCT CCGGCGTCAC 
(2) INFORMATION FOR SEQ ID NO: 4 08: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human tyrosine hydroxylase gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 408: 
TGACGTCAGC TCAGCTTATA AGAGGCTGCT GGGCCAGGGC TGTGG 
(2) INFORMATION FOR SEQ ID NO: 409: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human HMG CoA reductase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 409: 
CAGCTCCGAG CGTGCGTAAG GTGAGGGCTC CTTCCGCTCC GCGACTGCGT 
(2) INFORMATION FOR SEQ ID NO: 410: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for lecithin-cholesterol 
acyltransf erase 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 410: 
CCTAGGGCCC CTCCCACTCC CACACCAGAT AAGGACAGCC CAGTGCCGCT 
(2) INFORMATION FOR SEQ ID NO: 411: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human porphobilinogen deaminase gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 411: 
CGCCCAGAGG GAGGGACCTC CCCTTCGAGG GAGGGCGCCG GAAGTGACGC 
(2) INFORMATION FOR SEQ ID NO: 412: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human porphobilinogen deaminasegene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 412: 
GCACAGCACT CCCACTGACA ACTGCCTTGG TCAAGGTGGG CTTCAGGGCT 
(2) INFORMATION FOR SEQ ID NO: 413: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human URO-D gene for 
uroporphyrinogen decarboxylase 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 413: 
GGGGGGCAGG CTCAGATTCA GGTTAAATTG TGGATTGAGC TCGCAGTTAC 
(2) INFORMATION FOR SEQ ID NO: 414: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human URO-D gene for 
uroporphyrinogen decarboxylase 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 414: 
GGGGGGCAGG CTCAGATTCA GGTTAAATTG TGGATTGAGC TCGCAGTTAC A 
(2) INFORMATION FOR SEQ ID NO:415: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase B (ALDOB) gene 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO:415r 
AAAAAAAAAA CATGATGAGA AGTCTATAAA AATTGTGTGC TACCAAAGAT 
(2) INFORMATION FOR SEQ ID NO: 416: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 416: 
GGTGGCGCTG CTCACCACAC ACAAGTGTTA TAGGAGGAGT CTGGCCCTTG 
(2) INFORMATION FOR SEQ ID NO:417 i 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 417: 
GGTGGCGCTG CTCACCACAC ACAAGTGTTA TAGGAGGAGT CTGGCCCTTG A 
(2) INFORMATION FOR SEQ ID NO: 418: x 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:418: 
TGTGGGGCGG GCAGGAGCTG CCTTATAACC AGCCCGGGAA CCCCTAGCTC 
(2) INFORMATION FOR SEQ ID NO: 419: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 419: 
TGTGGGGCGG GCAGGAGCTG CCTTATAACC AGCCCGGGAA CCCCTAGCTC A 
(2) INFORMATION FOR SEQ ID NO: 420: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:420: 
GCTCGGCGGA GGGCGGAGTG GTGCCTTTAA AAGGCCGGGC GCCGCCTTCC 
(2) INFORMATION FOR SEQ ID NO: 421: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:421: 
GCTCGGCGGA GGGCGGAGTG GTGCCTTTAA AAGGCCGGGC GCCGCCTTCC G 
(2) INFORMATION FOR SEQ ID NO: 422: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 422: 
GCTAAATCGG CTGCGTTCCT CTCGGAACGC GCCGCAGAAG GGGTCCTGGT 
(2) INFORMATION FOR SEQ ID NO: 423: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human aldolase A gene 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 423: 
GCTAAATCGG CTGCGTTCCT CTCGGAACGC GCCGCAGAAG GGGTCCTGGT G 
(2) INFORMATION FOR SEQ ID NO: 424: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human phosphoglycerate kinase gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 424: 
GAGGCGGGGT GTGGGGCGGT AGTGTGGGCC CTGTTCCTGC CCGCGCGGTG 
(2) INFORMATION FOR SEQ ID NO: 425: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for glucose 6-phosphate 
dehydrogenase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 425: 
CAGGCGCCCG CCCCCGCCCC CGCCGATTAA ATGGGCCGGC GCGGCTCAGC 
(2) INFORMATION FOR SEQ ID NO: 42 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human hepatic lipase gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO;426: 
GCAGTCTTCC CTAACAAAGT ATCTAATAGG CATTGTGGTC TCTTTGGCTT 
(2) INFORMATION FOR SEQ ID NO: 427: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 51 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human hepatic lipase mRNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 427: 
GCAGTCTTCC CTAACAAAGT ATCTAATAGG CATTGTGGTC TCTTTGGCTT C 
(2) INFORMATION FOR SEQ ID NO: 428: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human protein C gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:428: 
AGTGCTGAGG GCCAAGCAAA TATTTGTGGT TATGGATTAA CTCGAACTCC 
(2) INFORMATION FOR SEQ ID NO: 429: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human factor IX gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 9: 
CCAGAAGTAA ATACAG CTCA GCTTGTACTT TGGTACAACT AATCGACCTT 
(2) INFORMATION FOR SEQ ID NO: 430: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human MHC III HLA factor B gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:430: 
GCAGGTGCCA GAACACAGAT TGTATAAAAG GCTGGGGGCT GGTGGGGAGC 
(2) INFORMATION FOR SEQ ID NO: 431: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human pepsinogen gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 431: 
CGATAAGGCG GGACCCAACT TGTATATAAG GGCAGCTCAT GCTGCTGCTC 
(2) INFORMATION FOR SEQ ID NO: 432 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human pepsinogen C gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 432: 
CGATTAGACT AATCTTGGGC GTATAAAAGA GGAAAGAGTG CCCAGGTCTT 
(2) INFORMATION FOR SEQ ID NO: 433: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human collagenase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 433: 
CTGGAAGGGC AAGGACTCTA TATATACAGA GGGAGCTTCC TAGCTGGGAT 
(2) INFORMATION FOR SEQ ID NO: 434: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human stromelysin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 434: 
CCAAACAAAC ACTGTCACTC TTTAAAAGCT GCGCTCCCGA GGTTGGACCT 
(2) INFORMATION FOR SEQ ID NO: 435: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human alpha-amylase gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 435: 
TCTGATCCGT GCAGGGTATT AATGTGTCAG GGCTGAGTGT TCTGAGATTT 
(2) INFORMATION FOR SEQ ID NO: 436: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human pancreatic alpha-amylase gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 6: 
TGTAAAATGT GCTTCTTACA GGAATATAAA TAGTTTCTGG AAAGGACACT 
(2) INFORMATION FOR SEQ ID NO: 437: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human pancreatic amylase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 437: 
TGTAAAATGT GCTTCTTACA GGAATATAAA TAGTTTCTGG AAAGGACACT 
(2) INFORMATION FOR SEQ ID NO: 438: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: human cytochrome P450c gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 438: 
GCCACACGTA CAAGCCCGCC TATAAAGGTG GCAGTGCCTT CACCCTCACC 
(2) INFORMATION FOR SEQ ID NO: 439: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytochrome P-450c gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 439: 
GCCACACGTA CAAGCCCGCC TATAAAGGTG GCAGTGCCTT CACCCTCACC C 
(2) INFORMATION FOR SEQ ID NO: 440: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for cytochrome P(l)-450 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO:440: 
CACGTACAAG CCCGCCTATA AAGGTGGCAG TGCCTTCACC 
(2) INFORMATION FOR SEQ ID NO: 441: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 50 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

{vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human steroid 21 -hydroxy la Be [P450 
(C21)] B gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:441: 
GGATGGCTGG GGCTCTTGAG CTATAAGTGG CACCTCAGGG CCCTGACGGG 
(2) INFORMATION FOR SEQ ID NO: 442: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human mitochonrial aldehyde 
dehydrogenase 2 gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 442: 
TTCCTGACCA TGGTACTTAT AAAAGCAGTG CCGTCTGCCC CATCCATGTC 
(2) INFORMATION FOR SEQ ID NO: 443: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human carbonic anhydrase III gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 443: 
AAGGCCATGC AAGTGTGCGG GGGAGCTACA TAAAAGCGCG GGCTCGCGCG 
(2) INFORMATION FOR SEQ ID NO: 444: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human creatine kinase B isozyme gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 444 : 
TGGGCGGCCC GCGTTGTGCC CCTTAAGAGC CGCGGGAGCG CGGAGCGGCC 
(2) INFORMATION FOR SEQ ID NO: 445: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 
<C) STRAND EDNESS ; double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human preproenkephalin A gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 445: 
CTTCGGTTTG GGGCTAATTA TAAAGTGGCT CCAGCAGCCG TTAAGCCCCG 
(2) INFORMATION FOR SEQ ID NO:446: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human preprokephalin A gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 446: 
CTTCGGTTTG GGGCT AATTA TAAAGTGGCT CCAGCAGCCG TTAAGCCCCG G 
(2) INFORMATION FOR SEQ ID NO: 447: 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human prepro form of corticotropin 
releasing factor gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 447: 
TTTTTGAAGA GGGTCGACAC TATAAAATCC CACTCCAGGC TCTGGAGTGG 
(2) INFORMATION FOR SEQ ID NO:448: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human preprothyrotropin-releasing 
hormone gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 448: 
GACCTCACTC GAGCCGCCGC CTGGCGCAGA TATAAGCGGC GGCCCATCTG 
(2) INFORMATION FOR SEQ ID NO: 449: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for fetal A-gamma and 
G-gamma hemoglobin 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 449: 
CGGCTGGCTA GGGATGAAGA ATAAAAGGAA GCACCCTTCA GCAGTTCCAC 
(2) INFORMATION FOR SEQ ID NO: 450: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human gene coding for ACTH and 
beta-LPH precursors 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 450; 
CCCACCAGGA GAGCTCGGCA AGTATATAAG GACAGAGGAG CGCGGGACCA 
(2) INFORMATION FOR SEQ ID NO:451: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human somatostatin I gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:451: 
TAGCCTGACG TCAGAGAGAG AGTTTAAAAC AGAGGGAGAC GGTTGAGAGC 
(2) INFORMATION FOR SEQ ID NO: 452: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human glucagon gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 452 : 
GTGAGGCTAA ACAGAGCTGG AGAGTATATA AAAGCAGTGC GCCTTGGTGC 
(2) INFORMATION FOR SEQ ID NO: 453: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human glucagon gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 453: 
GTGAGGCTAA ACAGAGCTGG AGAGTATATA AAAGCAGTGC GCCTTGGTGC A 
(2) INFORMATION FOR SEQ ID NO: 454: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human chorionic gonadotropin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:454: 
AGGTGGAAAC ACTCTGCTGG TATAAAAGCA GGTGAGGACT TCATTAACTG 
(2) INFORMATION FOR SEQ ID NO: 455: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human chorionic gonadotropin gene 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 455: 
TTGAACTGTG GTGCAGGAAA GCCTCAAGTA GAGGAGGGTT GAGGCTTCAA 
(2) INFORMATION FOR SEQ ID NO: 456: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human beta-LH gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 456: 
GCCGCCCCCA CAACCCCGAG GTATAAAGCC AGATACACGA GGCAGGGGAT 
(2) INFORMATION FOR SEQ ID NO: 457: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human follicle-stimulating hormone 

beta-subunit gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 457: 
TAGTTGCACA TGATTTTGTA TAAAAGGTGA ACTGAGATTT CATTCAGTCT 
(2) INFORMATION FOR SEQ ID NO: 458: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human prolactin gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 458: 
TATTCATGAA GATATCAAAG GTTTATAAAG CCAATATCTG GGAAAGAGAA 
(2) INFORMATION FOR SEQ ID NO: 459: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human parathyroid gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:459: 
GACATCATCT GTAACAATAA AAGAGCCTCT CTTGGTAAGC AGAAGACCTA 
(2) INFORMATION FOR SEQ ID NO: 460: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Owl monkey insulin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:460: 
GGGGAGATGG GCTCTGGGCC TATAAAGCCA GCAGGGACCC AGCAGCCCTC 
(2) INFORMATION FOR SEQ ID NO: 461: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

{iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: human insulin/IGF II gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:461: 
CCCCGCCTCC AGAGTGGGGG CCAAGGCTGG GCAGGCGGGT GGACGGCCGG 
(2) INFORMATION FOR SEQ ID NO: 462: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human insulin like growth factor 
IGFII gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 462: 
AAAGAACTCT GCCTTGCGTT CCCCAAAATT TGGGCATTGT TCCGGCTCGC 
(2) INFORMATION FOR SEQ ID NO: 463: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human insulin-like growth factor II 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 463: 
CCCTGGGCCG CGGCTGGCGC GACTATAAGA GCCGGGCGTG GGCGCCCGCA 
(2) INFORMATION FOR SEQ ID NO: 464: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL : NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gastrin gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 464: 
AGTTGGGAGG GACCTTGAGG GCTTTATAAG GCAGGCCTGG AGCATCAAGC 
(2) INFORMATION FOR SEQ ID NO:465: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LFNGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interferon alpha gene 
INF-alpha 13 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 465: 
GGAAATCAGT ATGTTCCCTA TTTAAGGCAT CTGCAGGAAG CAAAG CCTTC 
(2) INFORMATION FOR SEQ ID NO:466: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for leukocyte interferon 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 466: 
GGAAGCTAGT ATGTTCCTTA TTTAAGACCT ATGCACAGAG CAAGGTCTTC 
(2) INFORMATION FOR SEQ ID NO: 467 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE; nucleic acid 
{C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interferon alpha gene 

INF-alpha 4b 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 467: 
GGAAATTAGT ATGTTCACTA TTTAAGACCT ATGCACAGAG CAAAGTCTTC 
(2) INFORMATION FOR SEQ ID NO: 468 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human gene for leukocyte (alpha) 
interferon 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 468: 

GGAAATTAGT ATGTTCACTA TTTAAGGCCT ATGCACAGAG CAAAGTCTTC 

(2) INFORMATION FOR SEQ ID NO: 469: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 50 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interferon genes LeIF-L and 
LeIF-J 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 469: 
GGAAATTAGT ATGTTCACTA TTTAAGACCT ATGCACAGAG CAAAGTCTTC 
(2) INFORMATION FOR SEQ ID NO: 470: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human gene for fibroblast (beta-1 
interferon 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:470: 
ATAGAGAGAG GACCATCTCA TATAAATAGG CCATACCCAC GGAGAAAGGA 
(2) INFORMATION FOR SEQ ID NO: 471: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human c-sis gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:471: 
CTCTCGCACT CTCCCTTCTC CTTTATAAAG GCCGGAACAG CTGAAAGGGT 
(2) INFORMATION FOR SEQ ID NO: 472: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human migratory inhibitory 
factor-related protein 8 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 472: 
CAGCTGGCCA AGCCTAACCG CTATAAAAAG GAGCTGCCTC TCAGCCCTGC 
(2) INFORMATION FOR SEQ ID NO: 473: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human migratory inhibitory 
factor-related protein 14 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 473: 
GTGCCCCAGT CAGGAGCTGC CTATAAATGC CGAGCCTGCA CAGCTCTGGC 
(2) INFORMATION FOR SEQ ID NO: 474: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human epidermal growth factor 
related gene 



WO 94/14980 



PCT/US93/12388 



449 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 474: 
GGTCCCTCCT CCTCCCGCCC TGCCTCCCGC GCCTCGGCCC GCGCGAGCTA 
(2) INFORMATION FOR SEQ ID NO: 475: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human opsin gene 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 475: 
GCTTAGGAGG GGGAGGTCAC TTTATAAGGG TCTGGGGGGG TCAGAACCCA 
(2) INFORMATION FOR SEQ ID NO:476: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human blue cone photoreceptor 
pigment gene 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO;476: 
TTTTGTGGGG TGGGAGGATC ACCTATAAGA GGACTCAGAG GAGGGTGTGG 
(2) INFORMATION FOR SEQ ID NO: 477: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human red cone photoreceptor pigment 
gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 477: 
CGGGCTGATC CCACAGGCCA GTATAAAGCG CCGTGACCCT CAGGTGATGC 
(2) INFORMATION FOR SEQ ID NO: 478: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human green cone photoreceptor 
pigment gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 478: 
CGGGCTGATC CCACTGGCCG GTATAAAGCG CCGTGACCCT CAGGTGACGC 
(2) INFORMATION FOR SEQ ID NO: 479: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

<C) INDIVIDUAL ISOLATE: Human interferon-inducible gene 
IFI-56K 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 479: 
TTGGCTGCTG TTTAGCTCCC TTATATAACA CTGTCTTGGG GTTTAAACGT 
(2) INFORMATION FOR SEQ ID NO: 480: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human interf eron-induced 15-Kd 
protein (ISG) gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 480: 
GACGTGTGTG CCTCAGGCTT AATAATAGGG CCGGTGCTGC TGCGGAAGCC 
(2) INFORMATION FOR SEQ ID NO: 481: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human ubiquitin-like protein (GdX) 
gene 

(Xi> SEQUENCE DESCRIPTION: SEQ ID NO: 481: 
TCCAGCGCGC GCGCCCGGGG CGGCGGCGCG CGGCGGGGGG TGGTTGGGGT 
(2) INFORMATION FOR SEQ ID NO: 482: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human exogenous retrovirus erv3 5" 
long terminal repeat 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:482: 
CCGCCCCTGT TGGTTGCATG TATAAAAGTC AAGCCCTGTC ATTGTTCAGG 
(2) INFORMATION FOR SEQ ID NO: 483: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 ba&e pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Bovine leukemia virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 483: 
ACCTCACCTG CTGATAAATT AATAAAATGC CGGCCCTGTC GAGTTAGCGG 
(2) INFORMATION FOR SEQ ID NO: 484: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human T-cell lymphotropic virus type 
I 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 484: 
TCAATAAACT AGCAGGAGTC TATAAAAGCG TGGAGACAGT TCAGGAGGGG 
(2) INFORMATION FOR SEQ ID NO: 485: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human T-cell leukemia virus II 
proviral LTR 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 485: 
TCAAAATAAA AGATGCCGAG TCTATAAAAG CGCAAGGACA GTTCAGGAGG 
(2) INFORMATION FOR SEQ ID NO: 486: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human T-cell Lymphotropic virus 
III (HIV-1) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 486: 
GGCGAGCCCT CAGATCCTGC ATATAAGCAG CTGCTTTTTG CCTGTACTGG 
(2) INFORMATION FOR SEQ ID NO: 487: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Aids-associated retrovirus 
( arv-2 ; proviral ) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 487: 
TGGCGTCCCT CAGATGCTGC ATATAAGCAG CTGCTTTTTG CCTGTACTGG 
(2) INFORMATION FOR SEQ ID NO:488: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human immunodeficiency virus type 2 
(HIV-2) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 488: 
GCCCTCATAT TCTCTGTATA AATATACCCG CTAGCTTGCA TTGTACTTCG 
(2) INFORMATION FOR SEQ ID NO: 489: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : RNA (genomic) 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Visna lentivirus, Icelandic strains 
LV1-1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 489: 
CATAACCGCA GATGTAAACA AGTTGCCTAT ATAAGCCGCT TGCTAGCTGG 
(2) INFORMATION FOR SEQ ID NO: 490: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human cytomegalovirus strain AD169 
gene I 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 490: 
GGCGTGTACG GTGGGAGGTC TATATAAGCA GAGCTCGTTT AGTGAACCGT 
(2) INFORMATION FOR SEQ ID NO: 491: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Murine cytomegalovirus 
intermediate-early gene I 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 491: 
GCTGAGCTGC GTTCACGTGG GTATAAGAGG CGCGACCAGC GTCGGTACCG 
(2) INFORMATION FOR SEQ ID NO: 492: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytomegalovirus 
intermediate-early glycoprotein UL37 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:492: 
CGTCATGTCC GGCATCTTCA TGTATATAAG ACGGTGTTTC AAGACGACGT 
(2) INFORMATION FOR SEQ ID NO: 493: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytomegalovirus I-E 
glycoprotein US3 gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 493: 
ACAACGTCAC CAAGAAACGC TATATATTCA AAAACACCGT TCAGTCCACA 
(2) INFORMATION FOR SEQ ID NO: 494: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes Simplex Virus type 1 gene 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 494: 
TTTGGGGAGG GGAAAGGCGT GGGGTATAAG TTAGCCCTGG CCCGACAGTC 
(2) INFORMATION FOR SEQ ID NO: 495: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes Simplex Virus type 1 gene II 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 495: 
AGCCGGCCCC GGCACCACGG GTATAAGGAC ATCCACCACC CGGCCGGTGG 
(2) INFORMATION FOR SEQ ID NO: 496: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus type II I-E 
gene II 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 496: 
AGCCGGCCCC GGTCGTGCGG GTATAAGGGC AGCCACCGGC CCACTGGGCG 
(2) INFORMATION FOR SEQ ID NO: 497: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

{iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus type I I-E gene 
III 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 497 : 
TTCCCGCCGG CCCCTGGGAC TATATGAGCC CGAGGACGCC CCGATCGTCC 
(2) INFORMATION FOR SEQ ID NO: 498: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 



(Vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Herpes simplex virus type II I-E 
gene III 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:498: 
CCCCGCGCGC CCCGAGCGAC TATATCAGCC AGGCGACGGG GCGATCGTCC 
(2) INFORMATION FOR SEQ ID NO: 499: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus type 1 I-E 
genes IV and V 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:499: 
GGGGGCGGGT CTCTCCGGCG CACATAAAGG CCCGGCGCGA CCGACGCCCG 
(2) INFORMATION FOR SEQ ID NO: 500: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Herpes simplex virus type 2 I-E 
genes IV and V 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 500: 
ACGGGGGGCG GGCCGTTCCT CGCGCACATA AAGGGCCGGC GTCCCGGTCG 
(2) INFORMATION FOR SEQ ID NO: 501: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL r NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytomegalovirus DNA 
polymersase gene 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 501: 
TAGGCGGGCT GGAAAGATGA TGTATAAATA GAGTCTGCGA CGGGGTTCGG 
(2) INFORMATION FOR SEQ ID NO: 502: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytomegalovirus b' 2.2 to 
transcript (start 160513) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 502: 
TAGGCGGGCT GGAAAGATGA TGTATAAATA GAGTCTGCGA CGGGGTTCGG 
(2) INFORMATION FOR SEQ ID NO: 503: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytomegalovirus 2.7 kb 
transcript (start 4578) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:503: 
GCCCGCGCTC GGCAGAGCTA CCATATAAAA ACGCAGGGGT TTAGCAGCTT 
(2) INFORMATION FOR SEQ ID NO: 504: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b' 82K AlkExo 
(start 27048) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 504: 
CAGCACCAGG AGAGGCTTAA GCTCGGGAGG CAGCGCCACC GACGACAGTA 
(2) INFORMATION FOR SEQ ID NO: 505: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b' 42K gene 
(start site 106547) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 505: 
ATGGGTTGTG GTTATATGCA CTTCCTATAA GACTCTCCCC CACCGCCCAC 
(2) INFORMATION FOR SEQ ID NO: 506: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 39k dUTPase 
gene (start 106811) 
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(xi) SEQUENCK DESCRIPTION: SEQ ID NO: 506: 
CGTGTGCGAT AATACACACG CCCATCGAGG CCATGCCTAC ATAAAAGGGC 
(2) INFORMATION FOR SEQ ID NO: 507: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 ba3e pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b' 33K (start 
site 145165) 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 507: 
GGCCGGGCGA CCCAGATGTT TACTTAAAAG GCGTGCCGTC CGCCGGCATG 
(2) INFORMATION FOR SEQ ID NO: 508: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-l 21K (start 
site 145459) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 508: 
CGACGTACGC GATGAGATCA ATAAAAGGGG GCGTGAGGAC CGGGAGGCGG 
(2) INFORMATION FOR SEQ ID NO: 509: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b' 5 kb 
transcript (start 86216) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 509: 
CCCCACCCCT GCGCGATGTG GATAAAAAGC CAGCGCGGGT GGTTTAGGGT 
(2) INFORMATION FOR SEQ ID NO: 510: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b' RNR2 gene 
(start 89774) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 510: 
GGTCCGCCTT CTGGTCCACG CATATAAGCG CGGACTAAAA ACAGGGATGT 
(2) INFORMATION FOR SEQ ID NO: 511: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-2 RNR2 gene 
(start site 247) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 511: 
TGGTCCGCCT TCTCGTCCAC GCATATAAGC GCGGCCTGAA GACGGGGATG 
(2) INFORMATION FOR SEQ ID NO: 512: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b' tk gene 
(start site 47911) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 512: 
CGCGGTCCCA GGTCCACTTC GCATATTAAG GTGACGCGTG TGGCCTCGAA 
(2) INFORMATION FOR SEQ ID NO: 513: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 50 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-2 b' tk gene 
(start site 225) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 513: 
CGCGGCCCGA GGTCCACTTC GCATATTAAG GTGACGCGCG TGGCCTCGAA 
(2) INFORMATION FOR SEQ ID NO: 514: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b' dbp gene 
(start site 62318) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:514: 
CGGCACGCCC CCAGGTAAAG TGTACATATA CCAACCGCAT ACCAGACGCA 
(2) INFORMATION FOR SEQ ID NO: 515: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-l b' gB (3*3 
Kb) start 56081 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 515: 
CCACTCAGCG CGCCGCCTGG CGATATATTC GCGAGCTGAT TATCGCCACC 
(2) INFORMATION FOR SEQ ID NO: 516: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-l b' gD (start 
138337) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 516: 
CCACTCAGCG CGCCGCCTGG CGATATATTC GCGAGCTGAT TATCGCCACC 
(2) INFORMATION FOR SEQ ID NO: 517: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-2 b' gD (start 
5918) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 517: 
GGAGTATAAT AGAGTCTTTG TGTTTAAAAC CCGGGCTCGG TGTGGTGTTC 
(2) INFORMATION FOR SEQ ID NO: 518: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b' gE (start 
site 141171) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 518: 
GGAGAGGGCC CGCGGCGCAT TTAAGGCGTT GTTGTGTTGA CTTTGCCTCT 
(2) INFORMATION FOR SEQ ID NO: 519: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 ICP gene 
(start site 58361) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 519: 
AATTATTGCT ACGACATCCG TGCTTGTTTG TGTTCCGTGT CTATATCTCT 
(2) INFORMATION FOR SEQ ID NO: 520: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b' tr-4 
(start site 136729) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 520: 
GGCGGTGCTG TTTGCGGGTT GGCACAAAAA GACCCCGATC CGCGTCTGTG 
(2) INFORMATION FOR SEQ ID NO: 521: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 [U-S] b' tr 
(start 143245) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 521: 
GTGACGTCAA TTGCCCGAGC CGCATAAAGG GCCGGTGGTC CGCCTAGCCG 
(2) INFORMATION FOR SEQ ID NO: 522: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYFE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b'g' VPS 
(start 40768) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:522: 
GGGGTGGGGC GGGGGGGGGG GTATATAAGG CCTGGGATCC CACGTCCCCG 50 
(2) INFORMATION FOR SEQ ID NO: 523: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-l b'g' 2,lkb 
transcript (start 26639) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 523: 
CCCGTTAACC CCCCACGTGA TCAGCACGCC ACCGACACCG CAGACGAAAA 50 
(2) INFORMATION FOR SEQ ID NO: 524: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b'g' 
a'TIF/VSP (start 105259) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 524: 
GGGGCGGCCC GTGCGGGTTG CTTAAATGCG TGGTGGCGAC CACGGGCTGT 
(2) INFORMATION FOR SEQ ID NO: 525: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 b'g' 2.7 kb 
transcript (start 100998) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:525: 
GCCACGCCCA TAAGCTCCTC CCGATAAAAA CCGCCCCGAT GGCCCTGGAC 
(2) INFORMATION FOR SEQ ID NO: 52 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human cytomegalovirus UL36 gene 
(start 49862) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 526: 
GACGTCAACG CTGATAGTGT CTATAAAGGC CGTGCCGCCG CGCCGTAGTT 50 
(2) INFORMATION FOR SEQ ID NO: 527: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

{iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytomegalovirus g' pp65 gene 
(start 121072) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 527: 
TCCGCGTTTG GTCGCCTGCC TATGTAAGGC GGCGGCCGCA GAGGGCGCGC 50 
(2) INFORMATION FOR SEQ ID NO: 528: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytomegalovirus g' pp71 gene 
(start 119223) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 528: 
GTCACCGCTG CTATATTTGC GACAGTTGCC GGAACCCTTC CCGACCTCCC 
(2) INFORMATION FOR SEQ ID NO:529: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human cytomegalovirus g' ppl50 gene 
(start 43092) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:529: 
CGTATCCGCC TCCGCTATTA AACTACCCCC CCTCCCTCTA GGTGGGGCGC 
(2) INFORMATION FOR SEQ ID NO:530: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 g' 5 Kb 
transcript (start 103313) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 530: 
TTGTGTCGCA GGGCGGCCCG CGTATAAAGG CGAGAGCGCG GGACCGTTTC 
(2) INFORMATION FOR SEQ ID NO: 531: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 g' gC (start 
96170) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 531: 
AACCCCGGAT GGGGCCCGGG TATAAATTCC GGAAGGGGAC ACGGGCTACC 
(2) INFORMATION FOR SEQ ID NO: 532 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Herpes simplex virus-2 g' gC (start 
670) 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 532: 
GCGGGGGTGC CGTGGACGGG TATAAAGGCC AGGGGGGCAC GCGGGCCCAT 
(2) INFORMATION FOR SEQ ID NO: 533: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 g' gH gene 
(start 46581) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 533: 
CGGCAATAAA AAGACAGAAT AAAACGCACG GGTGTTGGGT CGTTTGTTCA 
(2) INFORMATION FOR SEQ ID NO: 534: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 g 1 42 K 
(start site 107130) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 534 : 
CCGGAGTCCC CGCTAACCTT CGGCATAAAA GCCACCGCGC GCCTGTTGAC 
(2) INFORMATION FOR SEQ ID NO: 535: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 Ori_s ORF 
(start site 132287) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 535: 
CGGAGGCCCC CGGGGTGCGT CCCCTGTGTT TCGTGGGTGG GGTGGGCGGG 
(2) INFORMATION FOR SEQ ID NO: 536: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-1 18 K (start 
site 97951) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 536: 
CCCGCCCACC GCTGGGCGCT ATAAAGCCCC CACCCTCTCT TCCCTCAGGT 
(2) INFORMATION FOR SEQ ID NO: 537: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Herpes simplex virus-2 18K (start 
site 2391) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:537: 
CCCCGCCGTC CCCCGGGCGT TATAAGCCGC CGCACTCGCT TTTCCCACCG 
(2) INFORMATION FOR SEQ ID NO: 538: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus LI 1Kb gene 
(start site 103194) 



WO 94/14980 



PCT/US93/12388 



481 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 538 : 
TGGTGCCTTG GCTTTAAAGG GGAGATGTTA GACAGGTAAC TCACTAAACA 
(2) INFORMATION FOR SEQ ID NO: 539: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus Rl 145K gene 
(start site 1721) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 539: 
ACTGTATAAA GGTAAGTATT ATTAAATTTT AGAGACACTA TCACGTGTAA 
(2) INFORMATION FOR SEQ ID NO: 540: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE : 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus Rl 20K (start 
site 9660) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 540: 
CTTTTAGCCA TGCCATGCTC TATAAATCAC TTCCCTATCT CAGGTAGGCC 
(2) INFORMATION FOR SEQ ID NO: 541: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus [DL/RJ (start 

site 52787) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 541: 
ACAGAGACCC CAAAAAGAGG ATAAAAGAAG GCGAGCCGGC CCGGCTCGCC 
(2) INFORMATION FOR SEQ ID NO: 542: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus R2 (start site 
61372) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 542: 
GTGACGGTCA GGCAGCTCCT GTATTTAACT TTGCGGACAG AGGCCAGAGC 
(2) INFORMATION FOR SEQ ID NO: 543: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus L2 (start site 
57050) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 543: 
TAATTACGCT TGTGTACATA TTTAAATCCA CACAAGTGGC CAGAGTGGGC 
(2) INFORMATION FOR SEQ ID NO: 544: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus Rl (start site 
88539) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 544: 
GACAGGGACG GCGGCGCTAT ATATAAGAGC CCAAGACCCG GCTCTCTTTA 
(2) INFORMATION FOR SEQ ID NO:545: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus R2 (start site 
88897) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 545: 
CGGATTAGAT GGGGATATTT AAAAGGGGCA GCAATCTCGG CTGTTTGTAC 
(2) INFORMATION FOR SEQ ID NO: 546: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Epstein Barr virus L2 (start site 
90021) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 546: 
ACCCAACAGG TGGTGAAAAT ATAACACAGG TGACACCAGC CTCTATCAGC 
(2) INFORMATION FOR SEQ ID NO: 547: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus [BamHl-L] LI 
(start site 92157) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 547: 
ACCCCCCTTG TACCTATTAA AGAGGATGCT GCCTAGAAAT CGGTGCCGAG 
(2) INFORMATION FOR SEQ ID NO: 548: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus [ BamHl-L) L3 
(start site 88480) 
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CGGGTCTTGG GCTCTTATAT ATAGCGCCGC CGTCCCTGTC TGTTAGATCA 
(2) INFORMATION FOR SEQ ID NO: 549: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus [BamHl-KJ 2.1 Kb 
(start site 109939) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 549: 
AGACGCCCTC AATCGTATTA AAAGCCGTGT ATTCCCCCGC ACTAAAGAAT 
(2) INFORMATION FOR SEQ ID NO: 550: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus [BamHl-KJ 1.3kb 
transcript (start 110632) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:550: 
TTGCGACCCC TCTGATATTA AGGTGGTTAT TTTGGGCCAG GACCCCTATC 
(2) INFORMATION FOR SEQ ID NO: 551: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus [EcoRl-H] LI 
(start: site 137680) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 551: 
CGGTGCCCGG ACTCAGAATT ATTAAACCGG GTGGCAGCTC CTGGCAGTCA 
(2) INFORMATION FOR SEQ ID NO: 552: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus [EcoRl-DJ LI 
(start site 159337) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 552: 
AAGGGCAGGG GGTGGGTATT TAAGGATCTA TATGCCCTTC TCTACCTGCA 
(2) INFORMATION FOR SEQ ID NO: 553: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus [EcoRl-D} Rl 
(start 165496) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 553: 
AATGGGCGTG GCAGAATAGT ATAAGACGCG AGGCCTGGGT GAGGAGAGTC 
(2) INFORMATION FOR SEQ ID NO: 554: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus [EcoRl-D] L2 
(start 167495) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 554: 
TCTTTCCTTG TCCTTACTGT ATAAAAGTCC ACGAAAACAG CTGTGCCTCA 
(2) INFORMATION FOR SEQ ID NO: 555: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr Virus [EcoRl-D] L1A 
start 169165 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 555: 

ACTGATGAGT AAGTATTACA CCCTTTGCCC CACACCCCCT TTCCCTTACT 

(2) INFORMATION FOR SEQ ID NO: 556: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 50 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr Virus [ EBNA ) El (start 
site 11333) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 556: 
AGGGGGGGAC TAAGGTCCCA CTACAAAAAC TCTGTGTTCT GCTGCAAATT 
(2) INFORMATION FOR SEQ ID NO: 557: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

<C) INDIVIDUAL ISOLATE: Epstein Barr virus [EBNA] E2 (start 
site 14399) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 557: 
GGTATAAAGT GGTCCTGCAG CTATTTCTGG TCGCATCAGA GCGCCAGGAG 
(2) INFORMATION FOR SEQ ID NO: 558: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus [EcoRl-DJ LI 
(start site 169514) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 558: 
CTCTGACGTA GCCGCCCTAC ATAAGCCTCT CACACTGCTC TGCCCCCTTC 
(2) INFORMATION FOR SEQ ID NO: 559: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type 2 Ela (start 498) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 559: 
GTCAGCTGAC GCGCAGTGTA TTTATACCCG GTGAGTTCCT CAAGAGGCCA 
(2) INFORMATION FOP SEQ ID NO: 560: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-5 Ela (start 499) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 560: 
GTCAGCTGAC GTGTAGTGTA TTTATACCCG GTGAGTTCCT CAAGAGGCCA 
(2) INFORMATION FOR SEQ ID NO: 561: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-7 Ela (start site 
512) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 561: 
TCAGCTGATC GCTAGGGTAT TTAAACCTGA CGAGTTCCGT CAAGAGGCCA 
(2) INFORMATION FOR SEQ ID NO: 562: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-12 Ela (start site 
306) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 562: 
AAATTGATGA CGGCAATTTT ATTATAGGCG CGGAATATTT ACCGAGGGCA 50 
(2) INFORMATION FOR SEQ ID NO: 563: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-12 Ela (start site 
445) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:563: 
GTCAGCTGAT CGTTTGGGTA TTTAATGCCG CCGTGTTCGT CAAGAGGCCA 50 
(2) INFORMATION FOR SEQ ID NO: 564: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Simian Adenovirus SA7 Ela (start 
site 440) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 564: 
TTATTGTCTA GGTGAGGGTA TTTAAACCGG CTCAGACCGT CAAGAGGCCA 
(2) INFORMATION FOR SEQ ID NO: 565: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-2 Elb (start 1700) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 565: 
GGGGCGGGGC TTAAAGGGTA TATAATGCGC CGTGGGCTAA TCTTGGTTAC 
(2) INFORMATION FOR SEQ ID NO: 566: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-5 Elb (start site 
1703) 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 566: 
GGGGCGGGGC TTAAAGGGTA TATAATGCGC CGTGGGCTAA TCTTGGTTAC 50 
(2) INFORMATION FOR SEQ ID NO: 567: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-7 Elb (start site 
1577) 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 567: 
TTCTTGGGTG GGGTCTTGGA TATATAAGTA GGAGCAGATC TGTGTGGTTA 50 
(2) INFORMATION FOR SEQ ID NO: 568: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C> INDIVIDUAL ISOLATE; Adenovirus type-12 Elb (start site 
1527) 
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{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 568; 
TGGGCGTGGT TAAACAGGGA TATAAAGCTG GGTTGGTGTT GCTTTGAATA 
(2) INFORMATION FOR SEQ ID NO: 569: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-2 EII (start site 
27092) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 569: 
GAAAGGGCGC GAAACTAGTC CTTAAGAGTC AGCGCGCAGT ATTTGCTGAA 
(2) INFORMATION FOR SEQ ID NO: 570: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-2 EIII (start site 
27610) 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 570: 
TGCGGTCGCC CGGGCAGGGT ATAACTCACC TGAAAATCAG AGGGCGAGGT 
(2) INFORMATION FOR SEQ ID NO: 571: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-5 EIII (start site 
239) 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 571: 
TGCGGTCGCC CGGGCAGGGT ATAACTCACC TGACTCTTGG AGGGCGAGGT 
(2) INFORMATION FOR SEQ ID NO: 572: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-2 EIV (start site 
35611) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 572: 
TTACGTCATT TTTTAGTCCT ATATATACTC GCTCTGTACT TGGCCCTTTT 
(2) INFORMATION FOR SEQ ID NO: 573: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-2 IVa2 (start site 
5827) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 573: 
CCCTCCCACT TAGCCTCCTT CGTGCTGGCC TGGACGCGAG CCTTCGTCTC 
(2) INFORMATION FOR SEQ ID NO: 574: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-5 IVa2 (start site 
5837) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 574: 
CCCTCCCACT TAGCCTCCTT CGTGCTGGCC TGGACGCGAG CCTTTGTCTC 
(2) INFORMATION FOR SEQ ID NO: 575: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-7 IVa2 (start 
site 5692) 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO:575: 
CCCTCCCACG TGGCCTCCTT TGTGCTGGCC TGGACACGCG CTTTTGTATC 
(2) INFORMATION FOR SEQ ID NO: 576: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Adenovirus type-2 IX (start site 
3575) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 576: 
GCTTAAGGGT GGGAAAGAAT ATATAAGGTG GGGGTCTCAT GTAGTTTTGT 
(2) INFORMATION FOR SEQ ID NO: 577: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-5 IX (start site 
3581) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:577: 
GCTTAAGGGT GGGAAAGAAT ATATAAGGTG GGGGTCTTAT GTAGTTTTGT 
(2) INFORMATION FOR SEQ ID NO: 578: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(Vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Adenovirus type-7 IX (start site 
3460) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 578: 
ATGGGGACTT TCAGGTTGGT AAGGTGGACA AATTGGGTAA ATTTTGTTAA 
(2) INFORMATION FOR SEQ ID NO: 579: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
{iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-2 major late 
(start site 6039) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 579: 
GTGTTCCTGA AGGGGGGCTA TAAAAGGGGG TGGGGGCGCG TTCGTCCTCA 
(2) INFORMATION FOR SEQ ID NO: 580: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Adenovirus type- 5 major late (start 
site 6049) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 580: 
GTGTTCCTGA AGGGGGGCTA TAAAAGGGGG TGGGGGCGCG TTCGTCCTCA 
(2) INFORMATION FOR SEQ ID NO: 581: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii.) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

<C) INDIVIDUAL ISOLATE: Adenovirus type-7 major late (start 
site 5904) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 581: 
GGGTCCCCGC CGGGGGGGTA TAAAAGGGGG CGGACCTCTG TTCGTCCTCA 
(2) INFORMATION FOR SEQ ID NO: 582: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Adenovirus type-12 major late 
(start site 972) 
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(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 582: 
AATTTTCTGG TGGTGGGCTA TAAAAAGGGG CGGGTCCTTG GTCTTCATCG 
(2) INFORMATION FOR SEQ ID NO: 583: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Adenovirus type-2 Llla (start site 
25954) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:583: 
GGCGTGGTAG TCCTCAGGTA CAAATTTGCG AAGGTAAGCC GACGTCCACA 
(2) INFORMATION FOR SEQ ID NO: 584: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human papilloma virus type 18 E6 
gene (start site 30) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 584: 
CAGCACATAC TATACTTTTC ATTAATACTT TTAACAATTG TAGTATATAA 
(2) INFORMATION FOR SEQ ID NO: 585: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human papilloma virus type-16 E6/E7 

(start site 97) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 585: 
GAACCGAAAC CGGTTAGTAT AAAAGCAGAC ATTTTATGCA CCAAAAGAGA 
(2) INFORMATION FOR SEQ ID NO: 586: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Human papilloma virus type-18 E6 
(start site 105) 
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{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 586: 
GGACCGAAAA CGGTGTATAT AAAAGATGTG AGAAACACAC CACAATACTA 
(2) INFORMATION FOR SEQ ID NO: 587: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Parvovirus h-1 H-1[+»04J (start 
site 209) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 587: 
AGTGGGCGTG GCTAACTGTA TATAAGCAGT CACTCTGGTC GGTTACTCAC 
(2) INFORMATION FOR SEQ ID NO: 588: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Parvovirus h-1 H-1 [+.40] (start 
site 2010) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 588: 
GCCGAAGCTA GACACTCCTA TAAATTCGCT AGGTTCAATG CGCTCACCAT 
(2) INFORMATION FOR SEQ ID NO: 589: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Human parvovirus B19-Au B19 [0.06] 
(start site 347) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 589: 
GAGCGTAGGC GGGGACTACA GTATATATAG CACGGTACTG CCGCAGCTCT 
(2) INFORMATION FOR SEQ ID NO: 590: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Simian virus 40 T/t late (start 
site 31) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 590: 
CCGCCCCTAA CTCCGCCCAG TTCCGCCCAT TCTCCGCCCC ATGGCTGACT 
(2) INFORMATION FOR SEQ ID NO: 591: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Simian virus 40 T/t early P2 < start 
site 5233) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 591: 
TGGCTGACTA ATTTTTTTTA TTTATGCAGA GGCCGAGGCC GCCTCGGCCT 
(2) INFORMATION FOR SEQ ID NO: 592: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: BK virus T/t early (start site 99) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:592: 
CCTGTGGCCT TTTTTTTTAT AATATATAAG AGGCCGAGGC CGCCTCTGCC 
(2) INFORMATION FOR SEQ ID NO: 593: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
<iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Polyoma virus T/t E (start site 

156) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 593: 
GGCCACCCAA ATTGATATAA TTAAGCCCCA ACCGCCTCTT CCCGCCTCAT 
(2) INFORMATION FOR SEQ ID NO: 594: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Simian virus 40 late (start site 
325) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 594: 
GTTCTTTCCG CCTCAGAAGG TACCTAACCA AGTTCCTCTT TCAGAGGTTA 
(2) INFORMATION FOR SEQ ID NO: 595: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Hepatitis B virus subtype adr4 3.6kb 
PI (start 1659) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 595: 
AGTTGGGGGA GGAGATTAGG TTAAAGGTCT TTGTACTAGG AGGCTGTAGG 
(2) INFORMATION FOR SEQ ID NO: 596: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Heptitis B virus subtype adr4 3.6 kb 
P2 (start 1690) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 596: 
TGTACTAGGA GGCTGTAGGC ATAAATTGGT CTGTTCACCA GCACCATGCA 
(2) INFORMATION FOR SEQ ID NO: 597: 

(i} SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

{iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Hepatitis B virus subtype adr4 2.2 
kb PI (start 3061) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 597: 
ATCGGCACTC AGGAAGACAG CCTACTCCCA TCTCTCCACC TCTAAGAGAC 
(2) INFORMATION FOR SEQ ID NO: 598: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Hepatitis B virus subtype* adr 4 2-2 
kb P2 (start 3092) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 598: 

CTCTCCACCT CTAAGAGACA GTCATCCTCA GGCCATGCAG TGGAACTCCA 

(2) INFORMATION FOR SEQ ID NO: 599: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 50 base pairs 
(5) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Epstein Barr virus (BamHl-FJ Rl 
(start 58862) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 599: 
TATTTTTGAA AAGGGATATT ATAAAACAGG TCATTGCTCG GATTGTGGCA 
(2) INFORMATION FOR SEQ ID NO: 600: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Promoter Seguenc of IL-13 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 600: 
GGTGTGAGGC GTCACCACTT GGGCCTATAA AAGCTGCCAC AAGACGCCAA GGCCAC 



(2) INFORMATION FOR SEQ ID NO: 601: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL9 BINDING SITE, HSV oris, higher 
affinity 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 601: 
CGTTCGCACT T 

(2) INFORMATION FOR SEQ ID NO: 602: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 
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(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL9 BINDING SITE, HSV oris, lower 
affinity 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 602: 
TGCTCGCACT T 

(2) INFORMATION FOR SEQ ID NO: 603: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL921 TEST SEQ. / UL9 ASSAY SEQ. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 603: 
GCGCGCGCGC GTTCGCACTT CCGCCGCCGG 
(2) INFORMATION FOR SEQ ID NO: 604: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL9Z2 TEST SEQ. / UL9 ASSAY SEQ. 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 604: 
GGCGCCGGCC GTTCGCACTT CGCGCGCGCG 
(2) INFORMATION FOR SEQ ID NO: 605: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDED NESS : double 
( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL9 CCCG TEST SEQ. / UL9 ASSAY 

SEQ. 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 605: 
GGCCCGCCCC GTTCGCACTT CCCGCCCCGG 
(2) INFORMATION FOR SEQ ID NO: 606: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL9 GGGC TEST SEQ. / UL9 ASSAY 
SEQ. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 606: 
GGCGGGCGCC GTTCGCACTT GGGCGGGCGG 
(2) INFORMATION FOR SEQ ID NO: 607: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL9 ATAT TEST SEQ. / UL9 ASSAY 
SEQ. 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 607: 
GGATATATAC GTTCGCACTT TAATTATTGG 
(2) INFORMATION FOR SEQ ID NO: 608: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL9 polyA TEST SEQ. / UL9 ASSAY 
SEQ. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 608: 
GGAAAAAAAC GTTCGCACTT AAAAAAAAGG 
(2) INFORMATION FOR SEQ ID NO: 609: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL9 polyT TEST SEQ. / UL9 ASSAY 
SEQ. 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 609: 
GGTTTTTTTC GTTCGCACTT TTTTTTTTGG 
(2) INFORMATION FOR SEQ ID NO: 610: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL9 GCAC TEST SEQ. / UL9 ASSAY 
SEQ* 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 610: 
GGACGCACGC GTTCGCACTT GCAGCAGCGG 
(2) INFORMATION FOR SEQ ID NO: 611: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 



WO 94/14980 



PCT/US93/12388 



518 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL9 ATori-1 Test sequence / UL9 
ASSAY SEQ. 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 611: 
GCGTATATAT CGTTCGCACT TCGTCCCAAT 
(2) INFORMATION FOR SEQ ID NO: 612: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: oriEC02 TEST SEQ* / TJL9 ASSAY SEQ 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 612: 
GGCGAATTCG ACGTTCGCAC TTCGTCCCAA T 
(2) INFORMATION FOR SEQ ID NO: 613: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: oriEC03 TEST SEQ. / UL9 ASSAY SEQ. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 613: 
GGCGAATTCG ATCGTTCGCA CTTCGTCCCA AT 
(2) INFORMATION FOR SEQ ID NO: 614: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE* NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: WILD TYPE 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 614: 
AAGTGAGAAT TCGAAGCGTT CGCACTTCGT CCCAAT 
(2) INFORMATION FOR SEQ ID NO: 615: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

{iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: TRUNCATED UL9 BINDING SITE, COMPARE 
SEQ ID NO: 601 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 615: 
TTCGCACTT 

(2) INFORMATION FOR SEQ ID NO: 616: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: HSVB1/4, SEQUENCE OF COMPETITOR DNA 
MOLECULE 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 616: 
GGTCGTTCGC ACTTCGC 

(2) INFORMATION FOR SEQ ID NO: 617: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Figure 14B, top strand of an 

exemplary target sequence 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 617: 
GCGTANNNNN CGTTCGCACT TNNNNCTTCG TCCCAAT 
(2) INFORMATION FOR SEQ ID NO: 618: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii; MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: HSV primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 618: 
ATTGGGACGA AG 

(2) INFORMATION FOR SEQ ID NO: 619: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: II base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE : 

(C) INDIVIDUAL ISOLATE: a sample distamycin target sequence 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 619: 
TTCCTCCTTT C 

(2) INFORMATION FOR SEQ ID NO: 620: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: a distamycin target sequence 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 620: 
TTCCNNNTTT C 

(2) INFORMATION FOR SEQ ID NO: 621: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Figure 27A, test oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 621: 
GCGTANNNNN CGTTCGCACT TNNNNCTTCG TCCCAAT 37 
(2) INFORMATION FOR SEQ ID NO: 622: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Figure 27B, oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 622: 
GCGTANNNNN CGTTCGCACT TNNNNCTTCG TCCCAAT 37 
(2) INFORMATION FOR SEQ ID NO: 623: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Figure 27C, oligonucleotide 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 623: 
GCGTANNNNN TTCACGCTTG CNNNNCTTCG TCCCAAT 
(2) INFORMATION FOR SEQ ID NO: 624: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Figure 27D, oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 624: 

GCGTANNNNN TTCACGCTTG CNNNNCTTCG TCCCAAT 

(2) INFORMATION FOR SEQ ID NO: 625: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH- 6 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: -35 region consensus sequence 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62 5: 
TTGACA 

(2) INFORMATION FOR SEQ ID NO: 626: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: -10 region consensus sequence 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 626: 
TATAAT 

(2) INFORMATION FOR SEQ ID NO: 627: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 242 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: HIV-1, LTR sequence, Figure 28 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 627: 

6TTA6AGT6G AGGTTTGACA GCCGCCTAGC ATTTCATCAC ATGGCCCGAG AGCTGCATCC 60 

GGAGTACTTC AAGAACTGCT GACATCGAGC TTGCTACAAG GGACTTTCCG CTGGGGACTT 120 

TCCAGGGAGG CGTGG CCTGG GCGGGACTGG GGAGTGGCGA GCCCTCAGAT CCTGCATATA 180* 

AGCAGCTGCT TTTTGCCTGT ACTGGGTCTC TCTGGTTAGA CCAGATCTGA GCCTGGGAGC 240 

TC 242 
(2) INFORMATION FOP SEQ ID NO: 628: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: a TFIID binding site 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 628: 



CCTGCATA 



8 



(2) INFORMATION FOR SEQ ID NO: 629: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C> INDIVIDUAL ISOLATE: a TFIID binding site 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 629: 
AAGCAGCT 8 
(2) INFORMATION FOR SEQ ID NO: 630: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(iii) 



HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 
(C) INDIVIDUAL ISOLATE: Oligonucleotide, Figure 29A 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 630: 
GCAGAATTCT GCAG 

(2) INFORMATION FOR SEQ ID NO: 631: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Oligonucleotide, Figure 29A 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:631: 
GCAGAATTCT GCAGCGTTCG CACTTTCTAG AGCTCAGG 38 
(2) INFORMATION FOR SEQ ID NO: 632: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Oligonucleotide, Figure 29A 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 632: 
AGATCTCGAG TCC 

(2) INFORMATION FOR SEQ ID NO: 633: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Oligonucleotide, Figure 29B 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 633: 
GCAGAATTCT GCAGNNNNCG TTCGCACTTT CTAGAGCTCA GG 
(2) INFORMATION FOR SEQ ID NO: 634s 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Oligonucleotide, Figure 29C 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 634: 
GCAGAATTCT GCAGNNNNNN NNCGTTCGCA CTTTCTAGAG CTCAGG 
(2) INFORMATION FOR SEQ ID NO: 635: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Oligonucleotide, Figure 29D 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 635: 
GCAGAATTCT GCAGCGTTCG CACTTNNNNN NNNTCTAGAG CTCAGG 
(2) INFORMATION FOR SEQ ID NO: 636: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Oligonucleotide, Figure 30 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO:636: 
CGTGAATTCT GCAG 

(2) INFORMATION FOR SEQ ID NO: 637: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Oligonucleotide, Figure 30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 637: 
CGTGAATTCT GCAGATG 

(2) INFORMATION FOR SEQ ID NO: 638: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Oligonucleotide, Figure 30 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 638: 
CGTGAATTCT GCAGATGAGG TACCNNNNNN CGTTCGCACT TTCTAGAGCT CTCC 
(2) INFORMATION FOR SEQ ID NO: 639: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) . ^ 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Oligonucleotide, Figure 30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 639: 
GTGAAAGATC TCGAGAGG 
(2) INFORMATION FOR SEQ ID NO: 640: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: Oligonucleotide, Figure 30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 640: 
AAGATCTCGA GAGG " 
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(2) INFORMATION FOR SEQ ID NO: 641: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: UL9 BINDING SITE, HSV oriS 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 641: 
CGTTCTCACT T 
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IT IS CLAIMED: 

1. A method of constructing a DNA-binding agent 
capable of sequence-specific binding to a duplex DNA 

5 target region, comprising: 

identifying in the duplex DNA, a target 
region containing a series of at least two non-overlap- 
ping base-pair sequences of four base-pairs each, where 
the four base-pair sequences are adjacent and each 
10 sequence is characterized by sequence-preferential 
binding to a small molecule, and 

coupling the small molecules to form a DNA- 
binding agent capable of sequence-specific binding to 
said target region. 

15 

2. The method of claim 1, where the duplex 
binding small molecules are identified as molecules 
capable of binding to a selected test sequence in a 
duplex DNA by: 

20 (i) adding a molecule to be screened to a 

test system composed of (a) a duplex DNA test oligonu- 
cleotide having a screening sequence adjacent a 
selected test sequence, where a DNA binding protein is 
effective to bind to said screening sequence with a 

25 binding affinity that is substantially independent of 
such test sequence, but where DNA protein binding to 
the screening sequence is sensitive to binding of test 
molecules to such test sequence, and (b) said DNA- 
binding protein, 

30 (ii) incubating the molecule in the test 

system for a period sufficient to permit binding of the 
molecule being tested to the test sequence in the 
duplex DNA, and 
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(iii) comparing the amount of binding 
protein bound to the duplex DNA test oligonucleotide 
before and after said adding. 

5 3. The method of claim 2, where the screening 

sequence is from the HSV origin of replication and the 
binding protein is UL9. 

4. The method of claim 3, wherein the DNA 
10 screening sequence is selected from the group consist- 
ing of SEQ ID 110:601, SEQ ID NO: 602, SEQ ID NO:615, and 
SEQ ID NO: 641. 

5. The method of claim 2, where said comparing 
15 is accomplished using either a gel band-shift assay or 

a filter-binding assay. 

6. The method of claim 1, where the subunits are 
the same and the DNA-binding agent is a homopolymer. 

20 

7. The method of claim 1, where the subunits are 
different and the DNA-binding agent is a heteropolymer . 

8. The method of claim 1, where the four base 
25 pair sequences are separated by at least 1-6 base- 
pairs. 

9. The method of claim 1, where said DNA-binding 
small molecules are coupled to each other using a 

30 spacer molecule. 

10. The method of claim 1, where the two sequenc- 
es are selected from the group of sequences consisting 

* of TTTC, TTTG, TTAC, TTAG, TTGC, TTGG, TTCC, TTCG, 
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TATC, TATG, TAAC, TAAG, TAGC, TAGG, TACC, TAGC sequenc- 
es. 

11. The method of claim 10, where the duplex DNA- 
5 binding small molecule is distamycin. 

12. A method of blocking transcription activity 
from a duplex DNA template, comprising 

identifying in the duplex DNA a binding site 
10 for a transcription factor and, adjacent the binding 
site, a target region having series of at least two 
non-overlapping base-pair sequences of four base-pairs 
each, where the four base-pair sequences are adjacent 
and each sequence is characterized by sequence-pref er- 
15 ential binding to a small molecule, and 

contacting the duplex DNA with a binding 
agent composed of the small molecules coupled to form 
a DNA-binding agent capable of sequence-specific 
binding to said target region. 

20 

13. The method of claim 12, where the target 
region is selected from DNA sequences adjacent a 
binding site for a eucaryotic transcription factor. 

25 14. The method of claim 13, where the transcrip- 

tion factor is TFIID. 

15. The method of claim 12, where the target 
region is selected from DNA sequences adjacent a 

30 binding site for a procaryotic transcription factor. 

16. The method of claim 15, where the transcrip- 
tion factor is a sigma factor. 
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17 • A DNA-binding agent capable of binding with 
base-sequence specificity to a target region in duplex 
DNA, where the target region contains at least two 
adjacent four base-pair sequences, comprising 
5 at least two subunits, where each subunit is 

a small molecule and has a sequence-preferential 
binding affinity for a sequence of four base-pairs in 
the target region, and 

where the subunits are coupled to form a DNA- 
10 binding agent capable of sequence-specific binding to 
said target region. 

18. A method of constructing a binding agent 
capable of sequence-specific binding to a duplex DNA 
15 target region, comprising: 

identifying in the duplex DNA, a target 
region containing (i) a series of at least two adjacent 
non-overlapping base-pair sequences of four base-pairs 
each, where each four base-pair sequence is character- 
20 ized by sequence-preferential biding to a small 
molecule, and (ii) adjacent to (i) a DNA duplex region 
capable of forming a triplex with a third-strand 
oligonucleotide, 

coupling the small molecules to form a DNA- 
25 binding agent capable of sequence-specific binding to 
said target region, and 

attaching the DNA-binding agent to a third 
strand oligonucleotide. 

30 19. The method of claim 18, where binding of the 

DNA-binding agent to duplex DNA causes a shift from B 
form to A form DNA. 

20. A DNA-binding agent capable of binding with 
35 base-s quence sp cif icity to a duplex DNA target region 



WO 94/14980 



PCT/US93/12388 



538 

containing two sites, a first site having at least two 
adjacent four base pair sequences, and a second site 
capable of forming a triplex with, a third-strand 
oligonucleotide, said DNA-binding agent comprising 
5 at least two subunits, where each subunit is 

a small molecule and has a sequence-preferential 
binding affinity for a sequence of four base-pairs in 
the target region, where the subunits are coupled to 
form a DNA-binding agent capable of sequence-specific 
10 binding to said first site, and 

a third strand capable of forming a triplex 
with the second site, 

where the third strand is attached to the 
DNA-binding agent. 

15 

21. A method of ordering the sequence binding 
preferences of a DNA-binding molecule, comprising 

(i) adding a molecule to be screened to a 
test system composed of (a) a duplex DNA test oligonu- 

20 cleotide having a screening sequence adjacent a 
selected test sequence, where a DNA binding protein is 
effective to bind to said screening sequence with a 
binding affinity that is substantially independent of 
such test sequence, but where DNA protein binding to 

25 the screening sequence is sensitive to binding of test 
molecules to such test sequence, and (b) said DNA- 
binding protein, 

(ii) incubating the molecule in the test 
system for a period sufficient to permit binding of the 

30 molecule being tested to . the test sequence in the 
duplex DNA, 

(iii) comparing the amount of binding 
protein bound to the duplex DNA test oligonucleotide 
before and after said adding, 



WO 94/14980 



PCT/US93/12388 



539 

t 

(iv) repeating steps (i) and (iii) using 
duplex DNA test oligonucleotides containing all test 
sequences of interest , and 

(v) ordering the relative amounts of protein 
5 to each duplex DNA test oligonucleotide bound in the 

presence of the molecule for each test sequence. 

22. The method of claim 21, where the test 
sequences are selected from the group of 256 possible 

10 four base sequences composed of A, G, C and T. 

23. A method for altering the binding character- 
istics of a DNA-binding protein to a duplex DNA, 
comprising 

15 identifying in the duplex DNA (i) a binding site 

for the DNA-binding protein, where said site comprises 
a series of contiguous paired nucleotides, and (ii) a 
target region adjacent the binding site, 

selecting a small molecule characterized by 

20 sequence-preferential binding to the target region, 
where, when the small molecule is bound to the target 
region, the small molecule is adjacent to the site for 
the DNA-binding protein or overlapping the site for the 
DNA-binding protein by at least one nucleotide pair, 

25 and 

contacting the duplex DNA with the small molecule 
at a concentration effective to alter binding of the 
DNA-binding protein to its binding site. 

30 24. The method of claim 23, where contacting the 

duplex DNA with the small molecule inhibits the binding 
of the DNA-binding protein to its binding site. 
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25. The method of claim 23, where contacting the 
duplex DNA with the small molecule enhances the binding 
of the DNA-binding protein to its binding site* 

5 26. The method of claim 23, where the DNA binding 

protein is a eucaryotic general transcription factor 
and the target region is selected from DNA sequences 
adjacent the binding site for the eucaryotic transcrip- 
tion factor. 

10 

27. The method of claim 26, where the transcrip- 
tion factor is TFIID. 

28. The method of claim 27, where the region is 
15 selected from the group of DNA sequences consisting of 

SEQ ID NO:l to SEQ ID NO: 600. 

29. The method of claim 23, where the DNA binding 
protein is a eucaryotic general transcription factor 

20 and the small molecule binds, in addition to the target 
region, 1 to three nucleotide pairs of the DNA-binding 
protein's binding site. 

30. The method of claim 29, where the eucaryotic 
25 general transcription factor is TFIID, and the small 

molecule binds to (i) the target region, and (ii) up to 
two nucleotides of the binding site for the eucaryotic 
transcription factor, where the nucleotides are 
contiguous to the target region. 

30 

31. The method of claim 23, where the DNA binding 
protein is a DNA replication factor* 
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32. A method of identifying test sequences in 
duplex DNA to which binding of a test molecule is most 
preferred, comprising 

(i) constructing a mixture of duplex DNA test 
5 oligonucleotides, where each oligonucleotide has (a) a 

screening sequence adjacent (b) a test sequence, where 
a DNA binding protein is effective to bind to said 
screening sequence with a binding affinity that is 
substantially independent of such test sequence, but 
10 where DNA protein binding to the screening sequence is 
sensitive to binding of test molecules to such test 
sequence, and (c) where test oligonucleotides of the 
mixture contain different test sequences, 

(ii) adding a test molecule to be screened to a 
15 test reaction composed of (a) said DNA binding protein, 

and (b) said duplex DNA test oligonucleotide mixture, 

(iii) incubating the molecule in the test reaction 
for a period sufficient to permit binding of the 
molecule being tested to test sequences in the duplex 

20 DNA, 

(iv) separating test oligonucleotides from test 
oligonucleotides bound to binding protein, 

(v) amplifying the separated test oligonucleo- 
tides, 

25 (vi) repeating steps (ii) to (v) , 

(vii) isolating the amplified test oligonucleo- 
tides, 

(viii) sequencing the isolated test oligonucleo- 
tides. 

30 

33. The method of claim 32, where said test 
sequences are selected from the group of 256 possible 
four base sequences composed of A, G, C and T. 
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34. The method of claim 32, where said construct- 
ing includes selecting test sequences from the sequenc- 
es presented as SEQ ID NO:l to SEQ ID NO: 600. 

5 35. The method of claim 32 , where in constructing 

the mixture of test oligonucleotides, said adjacent 
screening and test sequences are flanked by primer 
sequences . 

10 36. The method of claim 35, wherein said amplify- 

ing is carried out by successively repeating the steps 
of (a) denaturing the duplex test oligonucleotides to 
produce single-strand fragments, (b) hybridizing the 
single strands with primers, complementary to the 

15 primer sequences in the oligonucleotides, to form 
strand/primer complexes, (c) generating double-strand 
fragments from the strand/primer complexes in the 
presence of DNA polymerase and all four deoxyribo- 
nucleotides, and (d) repeating steps (a) to (c) until 

20 a desired degree of amplification has been achieved. 

37. The method of claim 32, wherein said amplify- 
ing is carried out by cloning the separated test oligo- 
nucleotides into a vector, passaging vectors carrying 

25 the test oligonucleotides in appropriate host cells, 
culturing the host, isolating the vectors, and obtain- 
ing the test oligonucleotides from the vectors. 

38. The method of claim 32, where said isolating 
30 is accomplished by cloning the amplified test oligo- 
nucleotides into a cloning vector. 

39. The method of claim 32, where said separating 
is. accomplished" by passing the test reaction through a 

35 filter, where said filter is capable of capturing 



WO 94/14980 



PCT/US93/12388 



DNA: protein complexes but not DNA that is free of 
protein. 

40. The method of claim 32, where the DNA 
5 screening sequence is from the HSV origin of replica- 
tion and the binding protein is UL9. 

41. A method of screening for molecules capable 
of binding to a selected test sequence in a duplex DNA, 

10 comprising 

(i) constructing a duplex DNA test oligonucleotide 
having a screening sequence adjacent a selected test 
sequence, where a DNA binding protein is effective to 
bind to said screening sequence with a binding affinity 

15 that is substantially independent of such test se- 
quence, but where DNA protein binding to the screening 
sequence is sensitive to binding of test molecules to 
such test sequence, 

(ii) adding a test molecule to be screened to a 
20 test system composed of (a) said DNA binding protein, 

and (b) said duplex DNA test oligonucleotide having 
said screening and test sequences adjacent one another, 

(iii) incubating the molecule in the test system 
for a period sufficient to permit binding of the 

25 molecule being tested to the test sequence in the 
duplex DNA, and 

(iv) comparing the amount of binding protein bound 
to the duplex DNA before and after said adding. 

30 42. The method of claim 41, where said test 

sequence is selected from the group consisting of SEQ 
ID NO:l to SEQ ID NO: 600. 



WO 94/14980 



PCT/US93/ 12388 



544 

43. The method of claim 41, where the DNA 
screening sequence is from the HSV origin of replica- 
tion and the binding protein is UL9. 

5 44. The method of claim 43, wherein the DNA 

screening sequence is selected from the group consist- 
ing of SEQ ID NO: 601, SEQ ID NO: 602, SEQ ID NO: 615, AND 
SEQ ID NO: 641. 
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1234567H9 11 
^GTGAGAATTCGAAGCGTTCGCACTTCGTCCCAA^ 



2UIGTGAGAATTCGAAGCGTTCGCACTTCGTCCCAA.T 3 ' 

UGAAGCAGGGTTA 5' 
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Screening 

Test Sequence: Sequence: Test Sequence 
Uij - u X Z-DNA 

Z— DNA 

UL9 CCCG 5 ' -ggcccgccccgttcgcacttcccgcccc:gg-3 ' 

UL9 GGGC 5 ' -GGCGGGCGCCGTTCGCACTTGGGCGGGCGG-3 ' 

UL9 ATAT 5 ' -GGATATATACGTTCGCACTTTAATTATTGG-3 ' 

UL9 polyA 5 7 -GGAAAAAAAC5CTCGCACTTAAAAAAAAGG-3 ' 

UL9 polyT 5 T —GG TTTTTTT CGTTCG CA Cl ' TTTT T'T T TT GG— 3 ' 

UL9 GCAC 5 ' — GGACGCACGCGTTCG CACTTGCAGCAG CGG— 3 ' 

ATori-1 5 ' — GCGTATATATCGTTCGCACZTTCGTCCCAAT— 3 ' . 

oriEco2 5 ' — GGCGAATTCGACGTTCGCACTTCGTC— CZAAT— 3 ' 

or iECo3 5 ' -GGCGAATTCGATC5TTCGCACTTCGTCZCAAT-3 ' 

Fior. 5 
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Screening 
Sequence 

GCGTANXXX XCGTTCGCACTTX XXXgTTCGTCCCAAT 
CGCATNXXXXGCAAGCGTGAAXXXXGAAGCAGGGTTA 



Test Test 
Site site 



Fig. 14B 
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Distamycin: rank vs r% score 
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Actinomycin D Screens 



Actinomycin D: Rank vs. average r% 
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Fig. 20 
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Actinomycin D: 


Variance from mean 
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Fig. 25 
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Fig. 27A 



> — OL9 — > > Score 

5 ' -GCGTANXYZZCGTTCGCACTTZY2Z CTTCGTCCCAAT-3 ' high 
3 ' -CGCATNYXQQG CAAGCGTGAAYXQQGAAG CAGGGTTA— 5 ' 



Fig. 27B 



— UL9 — > Score 
5 ' -GCGTANQQXyCGTTCGCA.CTTQQXYCTTCGTCCCAAT-3 ' low 
3 ' -CGCATNZZYXGCAAGCGTGAAZZYXGAAGCAGGGTTA-5 ' 



Fig. 27C 



5 ' -gcgtanxyzzttcacgczttgcxyzzcttcgtcccaat-3 ' 
3 ' - cgcatnyxq q aagtg cgaacgxxqqgaag cag g gtta- 5 ' 
< — ul& 



Fig. 27D 



5 ' -GCGTANQQXYTTCACGCTTGCQQXYCTTCGTCCCAAT-3 ' 
3 ' -CGCATNZZYXAAGTGCGAACGZZYXGAAGCAGGGTTA-5 ' 
< < — TJL& < 

Fig. 27E 

> — UL9 — > 

5 ' -GCGTANXYZZCGTTCGCACTTQQXYCTTCGTCCCAAT-3 ' 
3 ' -CGCATNYXQQGCAAGCGTGAAZZYXGAAGCAGGGTTA-5 ' 



Fig. 27F 



— UL9 — > > 

5 ' -GCGT3tflQQXYaalU'CGC^CTTZ^2CTT a; TCCCaaT-3 ' 
3 ' -CGCATNZZYXGCAAGCGTGAAYXQQGAAGCAGGGTTA-5 ' 
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5 ' -GCGTANXyZZTTCACGCTTGCQQXYCTTCGTCCCAAT-3 ' 
3 ' -CGCATNYXQQAAGTGCGAACGZZYXGAAGCAGGGTTA-5 ' 

< — TJL& < 
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5 ' -GCGTANQQXYTTCACGCTTGCXrZZCTTCGTCCCAAT-3 ' 
3 ' - CG CATNZZ YXAAGTG CGAACGYXQQGAAG CAG GGTTA-5 ' 
< < — UL9 
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G^cS^^G&^iSSS ATTTCATCAC ATGGCCCGAG 
AGCTGCATCC GGAGTACTTC AAGAACTGCT GACATCGAGC TTGCTACAAG 

^Sf-S>>| |S<hf-«b»| }<sp-i ins i<sp-J -gL. 1% 

GGACTTTCCG CTGGGGACTT TCCAGGGAGG CGTGGCCTGG GCGGGACTGG 

GGAGTGGCGA GCCCTCAGAT CCTGCATATA AGCAGCXGCT TTTTGCCTGT 

+1 prim trans cr xpt start -■— > 
ACTG ggtctctctggttagSccagatctgagcctgggagctc 



Fig. 28 
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Fig. 29A 



EcoRI/PstI 

primer 

5 ' -GCAGAATTCTGCAG-3 ' UL9 si-he 

5 ' -GCAGAATTCTGCAG (N) Z CGTTCGCACTTTCTAGAGCTCAGG— 3 ' 
3 ' -CGTCTTAAGACGTCililxGCAAGCGTGAAAGATCTCGAGTCC-5 ' 



where X is the number of bases in the test site. 



5 ' -GCAGAATTCTGCAGNNNNCGTTCGCACTTTCTAGAGCTCAGG-3 ' 



5 ' -GCAGAATTCTGCAGNNNNNNNNCGTTCGCACTTTCTAGAGCTCAGG-3 ' 



test 
site 



3 ' — AGATCTCGAGTCC — 5 ' 

Xbal/SacI 
primer 



Fig. 29B 



Fig. 29C 



Fig. 29D 

5 ' -GCAGAATTCTGCAGCGTTCGCACTTNNNNNNNNTCTAGAGCTCAGG-3 ' 
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UL9 Site 3' relative to th t st s guence: 



primers 

EcoRI/PstI 
5 ' -CGTGAATTCTGCAS-3 ' 
5 9 -CGTGAATTCTGCAGATG-3 ' 



Asp718 /Rsal/Kpnl 
restriction 
site 



UL9 site 



5 9 -CGTGAATTCTGCAGATGAGGTACCNN^ 

test 
site 

GTGAAAGATCTCGAGAGG- 5' 
VA r* O ri AAGATCTCGAGAGG- 5' 

rly# OV Xbal/SacT 

primers 



Snail Molecule Binding 
sequence 


Expected Score 
in Assay 


Potential Test | 
Site Sequenc U 


nj r Q site 
5'-. . . CGTTCGCACTTOTAC . . .-3' 


high 


TTAC 


5'-. . . CGTTCGCACTTTACH . . .-3' 


high 


TACN 


5'-. . . CGTTCGCACTTACHN. . .-3' 


high 


ACNN 



Fig. 31 



Small Molecule Binding 
Sequence 


Expected Score 
in Assay 


Potential Test 
Site Sequenc 


Smal _ 

5'-. . # CCCGGGTTAC. . .-3' 


high 


TTAC 


5'-. . . CCCGGGTACN • . .-3' 


low 


TACN 


| 5'-„.CCCGGGACNN.*.-3' 


low 


ACNN 
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